﻿Fisiere 5 Cu functii echivalente celor folosite pana acum: int fputc(int c, FiLE *stream);  * scrie caracter in fisier *  int fgetc(FiLE *stream);  * citeste caracter din fisier *   * gete, pute: la fel ca si fgetc, fputc, dar sunt macrouri *  int ungetc(int c, FiLE *stream);  * pune caracterul c inapoi *  int fscanf (FiLE *stream, const char *format, int fprintf(FiLE *stream, const char *format, int fputs(const char *s, FiLE *stream);  * scrie un sir *  int puts(const char *s);  * scrie sirul si apoi  n la iesire *  - citeste pana la (inclusiv) linie noua, sau max size - 1 caractere, adauga ’ 0’ la sfarsit => citirea sigura a unei linii, fara depasire returneaza null daca apare EOF inainte de a fi citit ceva Utilizarea si programarea calculatoarelor Curs 13 Marius Minea #include void cat(FiLE *fi)  * afiseaza un fisier deschis car { int c; while ((c = fgetc(fi)) ! void main(int argc, char *argv[]) FiLE *fp; if (argc == 1) cat(stdin);  * c else while (—argc > 0) {  * pt if (!(fp = fopen(*++argv, "r" fprintf(stderr, "can’t open else { cat(fp); fclose(fp); } Utilizarea si programarea calculatoarelor Curs 13 б acter cu caracter *  = EOF) putchar(c); } iteste de la intrare *  fiecare argument *  )))  * deschide, testeaza *  ° os", *argv);  * afiseaza, inchide *  Marius Minea void clearerr(FiLE *stream); reseteaza indicatorii de sfarsit de fisier si eroare pentru fisierul dat int feof(FiLE *stream);  * != 0: ajuns la sfarsit de fisier *  int ferror(FiLE *stream);  * != 0 la eroare pt acel fisier *  Daca un apel de sistem a rezultat in eroare, se poate citi codul erorii din variabila globala extern int errno; declarata in errno h Se poate folosi impreuna CU functia char *strerror(int errnum) ; din string h care returneaza un sir de caractere cu descrierea erorii Se poate folosi direct functia void perror(const char *s) ;  *stdio h*  care tipareste mesajul s dat de utilizator, un : si apoi descrierea erorii void exit(int status) ; *stdlib h*  termina normal executia prog - se scriu tampoanele, se inchid fisierele, se sterg cele temporare - se returneaza sistemului de operare codul intreg dat (v int mainO Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Pana acum: functii orientate pe caractere, linii, formatare (fisiere text) Pentru a citi scrie un numar de octeti, neinterpretati (in format ): size t fread(void *ptr, size t size, size t runemb, FiLE * st re am) ; size t fwrite(void *ptr, size t size, size t runemb, FiLE *stream);  * citesc scriu runemb obiecte de cate size octeti *  Functiile intorc numarul obiectelor complete citite scrise corect Daca e mai mic decat cel dat, cauza se afla din feof si ferror Cu ele, putem sa ne scriem functii proprii pentru fiecare tip de date: size t readint(int *pn, FiLE *stream)  * in format binar *  { return fread(pn, sizeof(int), 1, stream); } size t writedbl(double x, FiLE *stream)  * in format binar *  { return fwrite(&x, sizeof(double), 1, stream); } fprintf(fp, "7od", n); scrie intregul ca sir de cifre zecimale cu fwrite se scrie intregul in format binar (sizeof (int) octeti Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Fisiere 9 #include #include #define MAX 512  * copiem cate un sector odata *  int filecopy(FiLE *fi, FiLE *fo) { char buf[MAX]; int size;  * nr octeti cititi *  while (!feof(fi)) { size = fread(buf, 1, MAX, fi);  * citeste MAX octeti *  fwrite(buf, 1, size, fo);  * scrie doar cati s-au citit *  if (ferror(fi) || ferror(fo)) return errno; return 0; Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Fisiere 10 void main(int argc, char *argv[]) FiLE *fi, *fo; if (argc != 3) { fprintf(stderr, "usage: сору source destination n"); exit(l); } else { if (! (fi = fopen(argv , "r"))) { fprintf (stderr, "70s: can’t open ° os: ", argv , argv[l]); perror(NULL);  * am scris deja mesajul * ; exit(errno); if (!(fo = fopen(argv , "w"))) { fprintf (stderr, "70s: can’t open ° os: ", argv , argv ); perror(NULL); exit(errno); if (filecopy(fi, fo)) perror("Eroare la copiere"); if (fclose(fi) | fclose(fo)) perror("Eroare la inchidere"); Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Pe langa citire scriere secventiala, e posibila pozitionarea in fisier: long ftell(FiLE *stream);  * pozitia de la inceputul fisierului *  int fseek(FiLE *stream, long offset, int whence);  * pozitionare *  Al treilea parametru: punctul de referinta pt pozitionarea cu offset: seek set (inceput), seek cur (punctul curent), SEEK END (sfarsit) void rewind(FiLE *stream);  * repozitioneaza indicatorul la inceput *  (echivalent CU (void)fseek(stream, OL, SEEK SET), plus clearerr Repozitionarea trebuie efectuata: - cand dorim sa "sarim" peste o anumita portiune din fisier - cand fisierul a fost scris, si apoi dorim sa revenim sa citim din el int fflush(FiLE *stream); scrie in fisier tampoanele de date nescrise pt fluxul de iesire stream Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Fisiere 12 Functiile de tipul printf scanf pot avea ca sursa dest si siruri de char int sprintf(char *s, const char *format, int sscanf(const char *s, const char *format, Pentru sprintf, poate aparea problema depasirii tabloului in care se scrie, daca acesta nu e dimensionat corect (suficient) Se recomanda: int snprintf(char *str, size t size, const char *format, in care scrierea e limitata la size caractere => varianta sigura intre functii similare, trebuie alese cele corespunzatoare situatiei Ex: int n, r; char *s, *end; n = atoi(s);  * daca suntem siguri; nu semnaleaza erori *  n = strtol(s, &end, 10);  * se pot testa erori (s == end) si prelucra mai departe de la end *  r = sscanf (s, "° od", &n) ;  * se pot testa erori (r != 1) dar punctul de oprire in s nu e explicit (eventual cu ° on) *  Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Fisiere 13 - extensii (macro-uri) pentru scrierea mai concisa a programelor - preprocesorul efectueaza transformarea intr-un program C propriu-zis - directivele de preprocesare au caracterul # la inceput de linie #include sau #include "numefisier" - include textual fisierul numit (in mod tipic definitii) (a doua varianta: cauta intai in directorul curent apoi in cele standard) #define LEN 20  * preprocesorul inlocuieste LEN cu 20 peste tot *  int tab[LEN];  * programul trebuie modificat intr-un singur loc *  for (i=0; i (В) ? (A) : (B)) #define swapint(a, b) { int tmp; tmp = a; a = b; b = tmp; } fara interpretare => pot aparea probleme - folositi paranteze in jurul argumentelor (evita erori de precedenta) - argumentele: evaluate la fiecare aparitie textuala (ex de 2xin max) => rezultat incorect la evaluarea repetata a expresiilor cu efect lateral Utilizarea si programarea calculatoarelor Curs 13 Marius Minea Marius Minea marius@cs upt ro http:  www cs upt ro  marius curs lsd  19 decembrie 2016 studiul matematic al grafurilor (reprezentand relatii intre obiecte) De aici a evoluat (network Science): studiul retelelor complexe: de calculatoare, telecomunicatii, energie, biologice, sociale "studiul reprezentarilor ca retele a fenomenelor fizice, biologice si sociale, ducand la ale acestor fenomene" [US National Research Council] informai, un graf reprezinta o multime de obiecte ( intre care exista anumite ( sau Formal, un graf e o pereche ordonata G = (V, E), unde V e multimea nodurilor si E (multimea muchiilor) e o multime de perechi (u, iz) G V x V imagine: http:  en Wikipedia org wiki File:6n graf svg Un graf e daca muchiile sale sunt perechi Un graf e daca muchiile sale sunt perechi (nu conteaza sensul parcurgerii) imagini: http:  en Wikipedia org wiki File :Directed svg http:  en wikipedia org wiki File:Undirected svg Muchiile unui graf formeaza о E С V x V pe multimea nodurilor Un graf poate fi reprezentat printr-o relatie Vu, v G V (u, v) G E —> (v, u) G E intr-un graf , E e o relatie oarecare (nu trebuie sa fie simetrica, dar poate fi) Reciproc, poate fi vazuta ca un pentru (u, v) G E introducem o muchie и —> v Un (o cale) intr-un graf e o secventa de muchii care leaga o secventa de noduri xq, xn cu n > 0 astfel ca (x,-,x,+i) G E pentru orice  ' xi —> —> xn i —> xn Putem defini un drum atat in grafuri orientate cat si neorientate Un drum are un xq si un xn unui drum e numarul de muchii in particular, poate fi zero (un nod xq, fara niciun fel de muchii) Drumurile de lungime nenula sunt date de relatiei E: а E+ = |J Ek = EU E2 U k=l relatia Ek (k > 1) corespunde drumurilor de lungime к E2 = Eo E = {(u, iz) | 3w (u, w) G E A (w, iz) G E} и —> w —> V E3 = E2 o E = {(u, iz) | 3w (u, wz) G E A (w, iz) G E2} etc 2pasi v , и w v adica и w w v Putem deasemenea defini un predicat drum cu proprietatile Vu, iz G V (u, iz) G E —> drum(u, iz) Vu, v G V (3mz g V (u, wz) g E A drum(w, iz)) —> drum(u, iz) Un e un drum de lungime nenula in care nodurile de inceput si sfarsit sunt aceleasi Adeseori, lucram cu cicluri in care muchiile si nodurile nu apar de mai multe ori (cu exceptia nodului initial care e si cel final) Un graf e daca are un drum de la orice nod la orice nod (definitie generala, depinde de drum (in graf orientat sau neorientat) Pentru grafuri O e un subgraf conex maximal deci are un drum intre oricare doua noduri nu s-ar mai putea adauga alte noduri pastrand-o conexa Un graf cu n noduri si e muchii are > n — e componente conexe Demonstram prin inductie dupa e e = 0 => fiecare nod e o componenta conexa e > 1: stergem o muchie obtinem cel mult o componenta in plus Folositi ca sa demonstrati: un arbore cu n noduri are n — 1 muchii Un graf e daca are un drum de la orice nod la orice nod, si daca are un drum de la orice nod la orice nod 0 e un tare conex maximal Componentele tare conexe sunt disjuncte: relatia R(u, iz) : drum(u, iz) A drum(y, и) e o si componentele tare conexe sunt Graful orientat din figura e slab conex Are trei componente tare conexe imagine: http:   en Wikipedia org wiki File: Scc png Componentele conexe sunt orice nod e in componenta proprie un drum de la и la v e si drum de la v la и drum(u, iz) A drum(y, и ) —> drum(u, и ) Determinam componentele conexe parcurgand muchiile grafului: initial, fiecare nod e in propria componenta pentru o muchie (u, iz) componentele lui и si v Putem face asta printr-un algoritm structura Fiecare nod e singur sau legat la un nod cu care e echivalent o padure de arbori cu legaturi de la fiu la parinte (element): da reprezentantul clasei de echivalenta (radacina) (eleml, elem2): face elementele echivalente (leaga radacinile) find(X) = find(Y) = find(Z) = Z union(Y S) leaga find(Y) si find(S) : fiecare muchie are asociata o valoare numerica (poate reprezenta lungime, capacitate, etc ) numita Harta (inexacta) din Russell & Norvig, introduction to Al Def: unui nod (intr-un graf neorientat) e numarul de muchii care ating nodul Un exact o data Un exact o data e un drum care contine toate muchiile unui graf e un ciclu care contine toate muchiile unui graf Un graf conex neorientat are un ciclu eulerian daca si numai daca toate nodurile au grad par Un graf conex neorientat are un drum (dar nu si un ciclu) eulerian daca si numai daca exact doua noduri au grad impar (primul si ultimul nod din drum) reprezentarea programelor in compilatoare, analizoare de cod, etc nodurile: sau secvente liniare de instructiuni ( ) muchiile: descriu secventierea instructiunilor ( ) x := a + b; у := a * b; while (y > a) { a := a + 1; x := a -i- b } http:  vinaytech wordpress com 2008 10 04 abstract-syntax-tree  introducem o muchie f —> g daca functia f apeleaza pe g => graful de apel e ciclic daca exista functii (mutual) recursive g n = n = 0 0 1 + h (n-1) h n = n = 0 1 2 * g (n-1) n = g n + h n Daca identificam nodurile prin numere (consecutive), putem reprezenta graful ca patrata M[i,j] = 1 daca exista muchie de la i la j, altfel 0 sau M[i,j] poate contine lungimea costul muchiei Reprezentarea prin pentru fiecare nod u, lista multimea nodurilor v cu muchii (u, v) putem pastra lista intr-un (nod = cheie) e o traversare in dupa vizitarea nodului se parcurg (recursiv) toti vecinii (daca nu au fost vizitati inca) ca si cum vecinii ar fi introdusi intr-o viziteaza nodurile in ordinea distantei minime de nodul de plecare (in "valuri" care se departeaza de la nodul de pornire) nodurile inca nevizitate se pun intr-o 17 ianuarie 2005 Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 2 signed char e un (ca si short, int, long, long long) char e signed char (-128 127) sau unsigned char (0 255) (neprecizat) => poate fi folosit (si e convertit) ca un intreg in expresii cifra ^intreg: ’5’ == ’0’ + 5; 7 == ’7’ - ’0’ etc (cifre, litere mari, mici: trei blocuri de caractere in tabela ASCii) Functiile din isalphaO etc returneaza != 0 sau 0, NU 1 sau 0 => scrieti: if (isdigit(c)) si nu if (isdigit(c) == 1) Functiile de clasificare: definite si pentru EOF == -1 (toate false) Atentie! la numere cu semn poate introduce bitul de semn, nu 0 => folositi unsigned pentru efect bine definit (introduce 0) Un caracter (’a’, valoare intreaga) NU e un sir ("a", valoare adresa) => NU putem scrie atoi(’9’); strcat(s, ’b’); etc Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 3 Functiile standard au nevoie de  0 pentru a detecat sfarsitul unui sir La validarea datelor, testati valoarea returnata de scanf La corectare, goliti tamponul de intrare: while (getcharO != ’ n’); Declarati caracterul ca int pentru while ((c = getcharO) != EOF) Testati de EOF citire, inainte sau dupa Corect while (scanf ("° od" , &n) == 1)) (nu doar != 0) while (fgets(s, 80, stdin)) Evitati la sfarsit de fisier: while (isspace(c = getcharO)) iese pentru c == EOF (false) while (!isspace(c = getcharO)) se blocheaza la c == EOF (true) Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 4 Orice tablou in C are dimensiune si => nu exista tablouri de dimensiune necunoscuta int tab[]; are element! Cand accesam (ex umplem) un tablou NU avem voie sa depasim dimensiunea alocata — la scanf NU: ° os sau ° 0[A-Z] ci de ex ° 019s NU: gets DA: fgets - la ° os: permitem 1 mai putin decat tabloul (loc pentru  0) - fgets citeste automat cu 1 mai putin decat parametrul (atentie: ° os citeste , fgets citeste - la parcurgere NU: while ((c = getcharO) != EOF) tab[i++] = c; (trebuie verificata depasirea indicelui i) Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 5 O declaratie de pointer: tip *ptr; spune: un obiect (sau tablou) de tipul tip, dar inca , pentru el => nu-l putem folosi inainte de a-i atribui o zona de memorie ! (adresa unei variabile existente, sau zona alocata dinamic) - : cand cunoastem dinainte dimensiunea char s ; NU ne complicam: char *s; s = malloc(80); if (! s) - : cand stim dimensiunea in momentul apelului printf("Cate numere"); scanf ("° od", &n"); tab=malloc (n*sizeof (int)) ; l=strlen(s); if (p=malloc(1+1)) strcpy(p, s); else - : cand initial nu am alocat cat trebuie folosim pointerul returnat (poate muta memoria) Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 6 unui tablou e sa de inceput (o !) => numele unui tablou (inel, sir de caractere) e un (constant) => tablou [indice] sau pointer [indice] e acelasi lucru => char a , b ; a = b; NU copiaza tablouri, ci atribuie adrese ! (si da eroare de compilare, pentru ca a e constanta !) sl==s2 compara pointerii (se suprapun?), nu continutul: strcmp(sl, s2) => NU are sens sa scriem void f(char s ) scriem: void f(char tab[]) sau void f(char *tab) (NU se transmit 20 de caractere, se transmite adresa tabloului) char tab[num] [len] ; (daca cunoastem lungimea maxima a sirului) char *tab[NUM]; fiecare element (adresa) trebuie atribuit ( Programarea calculatoarelor 2 Curs 13 Marius Minea Recapitulare Erori frecvente 7 Orice parametru transmis trebuie sa aiba o valoare valida, utilizabila ! => un pointer transmis trebuie sa indice o zona de memorie valida! - zona respectiva e folosita la citire sau scriere, depinzand de functie NU: char *p; strcpy(p, "un sir"); p neinitializat nealocat ! NU: char **endptr; l=strtol(sir, endptr, 10); endptr e nealocat! DA: char *endptr; l=strtol(sir, feendptr, 10); scrie valoare la feendptr O functie nu poate intoarce adresa unei variabile (ex tablou) - e alocata pe stiva => va odata cu iesirea din corpul functiei => un pointer returnat de o functie provine din a) un parametru; b) o variabila globala (problematic: suprascriere); c) alocare dinamica Un pointer returnat de o functie trebuie sa fie sau Programarea calculatoarelor 2 Curs 13 Marius Minea Marius Minea marius@cs upt ro http:  www cs upt ro  marius curs lsd  8 ianuarie 2018 Traversarea pornind din nodul vo viziteaza nodurile dupa distanta crescatoare de la vo Fie Nk = multimea nodurilor cu drum de lungime 0 astfel incat solutia ia no dependent de problema) = clasa problemelor care pot fi rezolvate in timp polinomial (relativ la dimensiunea problemei) = clasa problemelor pentru care o solutie poate fi in timp polinomial Exemplu: realizabilitatea formulelor boolene 0 formula cu n propozitii are 2n atribuiri => timp incercand toate 0 atribuire data se verifica in timp (in dimensiunea formulei) parcurgem formula o data si obtinem valoarea => realizabilitatea ein dar nu se cunoaste un algoritm polinomial (din cate stim nu ein ) in general, a o solutie e (mult) mai simplu decat a o Probleme : cele mai dificile probleme din clasa daca s-ar rezolva in timp polinomial, orice alta problema din NP s-ar rezolva in timp polinomial ==> ar fi P = NP (se crede ) Realizabilitatea (SAT) e prima problema demonstrata a fi (Cook, 1971) Sunt multe altele (21 probleme clasice: Karp 1972) problema colorarii grafurilor (cate culori astfel ca noduri adiacente sa fie de culori diferite?) problema rucsacului (selectie de obiecte de valoare maxima, cu greutate totala limitata) "sumset-sum": intr-o multime de intregi exista o submultime de suma data? Cum demonstram ca o problema e NP-completa (grea) ? o problema cunoscuta din NP la problema studiata => daca s-ar putea rezolva in timp polinomial problema noua, atunci ar lua timp polinomial problema cunoscuta Una din cele mai fundamentale probleme in informatica Se crede ca NP, dar nu s-a putut (inca) demonstra imagine: http:  en wikipedia org wiki File:P np np-complete np-hard svg Revenim la intrebarea de la X la P(X) |X| ,sn, }, si un sir w, apartine el multimii date, w G Si Multimea S ar putea fi data: explicit, printr-o expresie regulata, un automat, o gramatica, Orice multime de siruri defineste (cel putin) o problema P(E*) C Probs Teorema lui Cantor ne spune atunci:  Progs  nu putem construi un tabel de adevar pentru o formula E deci sa putem (deduce) o formula Hhipa formulei ip din ipotezele H e pur sintactica: un sir de formule, fiecare o sau o sau rezultand printr-o (modus ponens) din formule anterioare e o notiune semantica, considerand si valori de adevar: ipotezele H implica e o de functie ei 62 (functia ei aplicata argumentului 62) la fel in ML: f x fara paranteze Toate notiunile fundamentale (numere naturale, booleni, perechi, decizie, recursivitate, etc ) pot fi exprimate in lambda-calcul start Automatul de mai sus paritatea unui sir de 0 si 1 (care la randul sau, poate reprezenta un numar in binar) are un numar par sau impar de 1 ? Comportamentul e determinat complet de si Automatul "stie" doar starea in care se afla: are Daca are n stari, am putea reprezenta starea ca o valoare pe (log2 n  biti Cum reprezentam insa un calculator, care (conceptual) nu are limita de memorie? Masina Turing e compusa din: o cu un numar infinit de ; fiecare contine un (banda poate fi infinita la unul ambele capete, e echivalent) un de citire scriere, controlat de un banda       i i i i i i i i 1^       cap citire scriere automat Automatul si continutul benzii determina comportarea Dupa 1) starea curenta si 2) simbolul aflat sub cap, masina: 1) trece in starea urmatoare, 2) scrie un (alt) simbol sub cap 3) muta capul la stanga sau la dreapta initial, banda are un sir finit de simboluri, capul e pe cel din stanga; restul celulelor contin un simbol special (numit vid sau blanc) cati a sunt pe banda? obtine fiecare bit din numarul de a —schimba a cu x din doi in doi : scrie 0 sau 1 dupa paritate repeta pana nu mai sunt a: Halt bbbb aaaaab —> bbbbxaxaxa bbb xaxaxab —> bbbOxxxaxx bb Oxxxaxxb —> bblOxxxxxx b lOxxxxxxb —> bllOxxxxxx Halt Formal, masina Turing se descrie printr-un tuplu cu 7 elemente: Q- multimea starilor automatului finit (de control) E: multimea finita a (din sirul initial) Г: multimea simbolurilor de pe banda; E с Г i> : Q x Г —> Q x Г x { , r} : functia de tranzitie: da starea urmatoare, simbolul cu care e inlocuit cel curent, si mutarea la stanga sau dreapta (in unele versiuni, echivalente, capul poate si ramane pe loc) qo G Q: starea initiala a automatului de control b G Г   E: simbolul vid (blanc): toate celulele cu exceptia unui numar finit sunt initial vide F C Q: multimea starilor finale, automatul se opreste (halt) Poate descrie (implementabil prin program) Nu exista algoritm care sa decida pentru orice automat si intrare daca se opreste ( ) - la fel pentru programe in formularea pentru programe: Nu exista algoritm (program) care ia un program arbitrar P si un set de date D si determina daca P(D) (rularea lui P cu datele D) s-ar termina (opri) sau ar rula la infinit Presupunem ca ar exista un astfel de program Deci, CheckHalt(X, X) spune ce face prog X cu textul sau ca date Construim un "program imposibil" care face opusul a ceea ce face! intai, definim programul avand ca intrare un program X: daca CheckHalt(X, X) decide , atunci cicleaza la infinit daca CheckHalt(X, X) decide , atunci stop Deci CheckHalt(X, X) spune ce face X(X) iar Test(X) face opusul Se opreste Test(Test)? Raspunsul e dat de C 7ec c 7a t(Test,Test) dar Test(Test) (cu X=7est) face opusul lui C ?ec  , deci nu poate exista CheckHalti sunt valori ("first-class values") care pot fi manipulate la fel ca orice alte valori: pot fi transmise ca argumente, returnate ca rezultate) in C: putem transmite   returna   atribui ex functia de comparare la qsort la functii ML (la compilare) corectitudinea tipurilor elimina multe erori inca de la compilare : tipurile nu trebuie precizate explicit, ele sunt deduse de compilator (din operatiile folosite) : functii care pot opera pe familii de mai multe tipuri (liste, arbori, etc de tipuri arbitrare) cartezian: tuple (perechi, triplete) similar structurilor in C (si in ML, campurile pot avea nume) disjuncta: tipurile cu variante ’a tree = L ’a | T ’a tree * ’a * ’a tree mecanism puternic pentru lucrul cu tipuri compuse compilatorul verifica tratarea tuturor cazurilor Functiile pot compuse Nu e necesara alocarea dinamica explicita si nici eliberarea memoriei gestionata automat la rulare ("garbage collection") C distinge intre valori "obisnuite" care pot fi returnate (intregi, reali, structuri) si valori reprezentate prin adresa lor de memorie (tablouri, siruri, functii, ) Functiile calculeaza valori intreg programul e o Secventierea e necesara utila doar pentru scriere la iesire ATENtiE! exprl; expr2; ехргЗ rezultatele primelor doua expresii => are sens doar daca acestea tiparesc valori (si la citire, valoarea trebuie transmisa functiei de prelucrare ML are care separa de Avem functii pentru multimi, asocieri, etc fara a fi expuse detaliile de reprezentare importanta in toate paradigmele de programare Exemplu: in matematica, o multime poate fi data prin enumerarea elementelor printr-o proprietate: : interval de valori: [a, b] : constrangeri: x numar Pointeri - declarand un pointer NU se aloca loc pentru obiect, doar pt adresa! - nu returnati dintr-o functie adresa unei variabile locale (ex tablou) Fisiere - tratati cazurile de eroare (si pt orice interactiune cu exteriorul) - diferentiati intre numerele stocate in format text (sir de cifre) si in format binar (ca in memorie; cititi cu fread in tip de dimensiune fixa corespunzatoare (ex uint32 t) Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea signed char Ѳ ИП (ca si short, int, long, long long) char Q signed char (-128 127) sau unsigned char (0 255) (neprecizat) => poate fi folosit (si e convertit) ca un intreg in expresii cifra ^intreg: ’5’ == ’0’ + 5; 7 == ’7’ - ’O’ etc (cifre, litere mari, mici: trei blocuri de caractere in tabela ASCii) Functiile din isalphaO etc returneaza != o sau o, NU 1 sau o => scrieti: if (isdigit(c)) si nu if (isdigit(c) == 1) Functiile de clasificare: definite si pentru EOF == -1 (toate false) Atentie! la numere cu semn poate introduce bitul de semn, nu 0 => folositi unsigned pentru efect bine definit (introduce 0) Un caracter (’a’, valoare intreaga) NU e un sir ("a", valoare adresa) => NU putem scrie atoi(’9’); strcat(s, ’b’); etc Functiile standard au nevoie de  o pentru a detecat sfarsitul unui sir La validarea datelor, testati valoarea returnata de scanf La corectare, goliti tamponul de intrare: while (getcharO != ’ n’); Declarati caracterul ca int pentru while ((c = getcharO) ! = EOF) Testati de EOF citire, inainte sau dupa Corect while (scanf("%d", &n) == 1)) (nu doar != 0) while (fgets(s, 80, stdin)) Evitati la sfarsit de fisier: while (isspace(c = getcharO)) ••• iese pentru c == EOF (false) while (!isspace(c = getcharO)) ••• se blocheaza la c == EOF (true) Orice tablou in C are dimensiune si => nu exista tablouri de dimensiune necunoscuta int tab[]; are element! Cand accesam (ex umplem) un tablou NU avem voie sa depasim dimensiunea alocata — la scanf NU: %s sau %[A-Z] ci de ex %19s NU: gets DA: fgets - la %s: permitem 1 mai putin decat tabloul (loc pentru  o) - fgets citeste automat cu 1 mai putin decat parametrul (atentie: %s citeste , fgets citeste — la parcurgere NU: while ((c = getcharO) != EOF) tab [i++] = c; (trebuie verificata depasirea indicelui i) Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea O declaratie de pointer: tip *ptr; spune: un obiect (sau tablou) de tipul tip, dar inca , pentru el => nu-l putem folosi inainte de a-i atribui o zona de memorie ! (adresa unei variabile existente, sau zona alocata dinamic) - : cand cunoastem dinainte dimensiunea char s ; NU ne complicam: char *s; s = malloc(80); if (! s) - : cand stim dimensiunea in momentul apelului, printf("Cate numere"); scanf("%d", &n"); tab=malloc(n*sizeof(int)); l=strlen(s); if (p=malloc(l+l)) strcpy(p, s); else - : cand initial nu am alocat cat trebuie folosim pointerul returnat (poate muta memoria) unui tablou e sa de inceput (o !) => numele unui tablou (inel, sir de caractere) e un (constant) => tablou [indice] sau pointer [indice] e acelasi lUCt'U => char a , b ; a = b; NU copiaza tablouri, ci atribuie adrese ! (si da eroare de compilare, pentru ca a e constanta !) sl==s2 compara pointerii (se suprapun?), nu continutul: strcmp(sl, s2) => NU are sens sa scriem void f(char s ) scriem: void f(char tab []) sau void f(char *tab) (NU se transmit 20 de caractere, se transmite adresa tabloului) char tab[num] [len] ; (daca cunoastem lungimea maxima a sirului) char *tab[NUM]; fiecare element (adresa) trebuie atribuit ( ! Orice parametru transmis trebuie sa aiba o valoare valida, utilizabila ! => un pointer transmis trebuie sa indice o zona de memorie valida! - zona respectiva e folosita la citire sau scriere, depinzand de functie NU: char +p; strcpy(p, "un sir"); p neinitializat nealocat ! NU: char ++endptr; l=strtol(sir, endptr, 10); endptr e nealocat! DA: char +endptr; l=strtol(sir, feendptr, 10); scrie valoare la feendptr O functie nu poate intoarce adresa unei variabile (ex tablou) - e alocata pe stiva => va odata cu iesirea din corpul functiei => un pointer returnat de o functie provine din a) un parametru; b) o variabila globala (problematic: suprascriere); c) alocare dinamica Un pointer returnat de o functie trebuie sa fie sau Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea Programarea calculatoarelor 2 Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii Recapitulare Erori frecvente Exercitii O functie se cu antetul urmat de corpul sau: tip-rezultat nume functie ( lista parametri ) {  * declaratii si instructiuni din corpul functiei *  } unde lista de parametri e fie void (daca nu sunt parametri), fie tip-par-i nume par-, , tip-par2 nume par2 , tip-parn nume parn O de functie e doar antetul urmat de ; - daca vrem sa folosim (apelam) functia f3ra ca ea sa fie inca definita - daca e definita altundeva (ex biblioteca) - declaratii din fisiere ,h int absfint n); int getchar(void); void exitfint status); double powfdouble x, double y); int mainfint argc, char *argv[]); Declaratia incepe cu void doar daca functia nu returneaza i O functie se apeleaza similar ca si in matematica: nume(argumente) Argumentele pot fi (inel, variabile sau constante) printf ("7,d" ,abs(x+2) ) ; c = getcharO; exit(l); z = 2 + pow(x-3, 5); Utilizarea si programarea calculatoarelor Curs 14 Marius Minea in limbajul C, Adica: -inainte de apel se calculeaza valoarea expresilor date ca argument - la inceputul executiei functiei, fiecare parametru din antet primeste valoarea argumentului corespunzator - parametrii din declaratia functiei se comporta ca si variabile locale: nu sunt vizibili dupa iesirea din functie; valorile void ffint x) { x = 5; } void main(void) { int у = 3; f(y); } in interiorul lui f, x se modifica din 3 in 5 Dar у nu se schimba ii! Daca apelam f (4), nu putem sa-l facem pe 4 egal cu 5 !!! Daca apelam f(y*y + 2), programul NU rezolva ecuatia y2 + 2 = 5 !!! Pentru a returna o valoare folosim instructiunea return expresie ; Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 3 Recapitulare Erori frecvente Exercitii La inceputul functiei, fiecare parametru are o valoare : valoarea expresiei transmisa ca argument E gresit sa scriem: void f(int x) { printf("introduceti valoarea lui x");    x e deja cunoscut Hi scanf ("7 d", &x);    daca x e dat, de ce vrem sa citim altul ??? } C3nd apelam f (4) inseamna ca vrem sa lucram cu 4, nu sa-l citim!!! Daca functia citeste x, nu-l are parametru (nu poate sa-l modifice!) NU: void citesteCint x); De ce Ti functiei valoarea lui x ca argument cand noi o valoare ? Ce inseamna citeste©) ?? Functia trebuie sa ia ca parametru o de intreg, pentru a scrie rezultatul la acea adresa: void citesteCint *px) { scanf("7,d", px); } sau valoarea (f3ra parametri, o variabila locala pt rezultat) int citeste(void) { int x; scanf ("7,d", &x) ; return x; } Utilizarea si programarea calculatoarelor Curs 14 Marius Minea O declaratie de pointer: tip *ptr; spune: un obiect (sau tablou) de tipul tip, dar inca , pentru el => nu-l putem folosi inainte de a-i atribui o zona de memorie ! (adresa unei variabile existente, sau zona alocata dinamic) - : cand cunoastem dinainte dimensiunea char s ; NU ne complicam: char *s; s = malloc(SO); if (!s) - : cand stim dimensiunea in momentul apelului, printf ("Cate numere"); scanf ("7,d", &n"); tab=malloc(n*sizeof (int)); l=strlen(s); if (p=malloc(l+l)) strcpy(p, s); else - : cand initial nu am alocat cat trebuie folosim pointerul returnat (poate muta memoria) Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 5 Recapitulare Erori frecvente Exercitii unui tablou e sa de inceput (o !) => numele unui tablou (inel, sir de caractere) e un (constant) => tablou [indice] sau pointer [indice] e acelasi lucru => char a , b ; a = b; NU copiaza tablouri, ci atribuie adrese ! (si da eroare de compilare, pentru ca a e constanta !) sl==s2 compara pointerii (se suprapun?), nu continutul: strcmp(sl, s2) = NU are sens sa scriem void f(char s ) scriem: void f(char tab[]) sau void f(char *tab) (NU se transmit 20 de caractere, se transmite adresa tabloului) char tab[NUM] [LEN]; (daca cunoastem lungimea maxima a sirului) char *tab[NUM]; fiecare element (adresa) trebuie atribuit ( ! Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Orice parametru transmis trebuie sa aiba o valoare valida, utilizabila ! => un pointer transmis trebuie sa indice o zona de memorie valida! - zona respectiva e folosita la citire sau scriere, depinzand de functie NU: char *p; strcpy(p, "un sir"); p neinitializat nealocat ! NU: char **endptr; l=strtol(sir, endptr, 10); endptr e nealocat! DA: char *endptr; l=strtol(sir, feendptr, 10); scrie valoare la feendptr O functie nu poate intoarce adresa unei variabile (ex tablou) - e alocata pe stiva => va odata cu iesirea din corpul functiei => un pointer returnat de o functie provine din a) un parametru; b) o variabila globala (problematic: suprascriere); c) alocare dinamica Un pointer returnat de o functie trebuie sa fie sau Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Prelucrare de texte si fisiere Aplicatii Prelucrare de texte si fisiere Aplicatii - un (orice pana la spatiu alb) char s ; scanf ("7>79s", s); spatiu alb = spatiu sau  f  n  r  t  v ignora spatii albe initiale; adauga ’ 0’ la sfarsit Nu se poate citi o linie de text in acest fel i din Ana are mere va citi doar primul cuvant: Ana - O , panaa > n> char s ; fgets(s, 80, stdin); citeste max 80-1 caractere, inclusiv ’ n’, adauga > 0’ la sfarsit stdin: identificator definit in stdio h pt fisierul standard de intrare Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea 1 in orice punct de program: int feof(FiLE *fp) returneaza nenul (adevarat) daca s-a atins sfarsitul lui fp; 0 daca nu pentru fisierul de intrare: feof (stdin) 2 Dupa valoarea returnata de functiile de intrare: int с; c = getcharO; if (c == EOF)  * *  c trebuie declarat int pentru a testa de EOF valoarea EOF (-1) e diferita de cea a oricarui caracter (o 255) scanf returneaza EOF (-1) daca intalneste imediat sfarsitul de fisier (nu si daca a reusit sa citesca macar ceva) => Folositi doar if (scanf ( ) == nr variabile dorite) pentru a testa citire corecta, nu doar if ((scanf ( )) (pentru ca si EOF e nenul) fgets returneaza null daca fisierul se termina inainte de a citi ceva Exemplu: prelucrarea unui fisier linie cu linie char lin ; while (fgets(lin, 128, stdin))  * prelucreaza lin *  Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii Prelucrare de texte si fisiere Aplicatii : Functiile din ctype h returneaza pentru un caracter de felul dorit si 0 in caz contrar ( neaparat 1 si 0) nu scrieti niciodata if (isalpha(c) == 1) Ci doar if (isalpha(c)) la cicluri infinite pentru sfarsit de fisier: int c; while (isdigit(c = getcharO))  * ceva *  va iesi din ciclu cand c nu e cifra, inclusiv la EOF (nu e cifra) int c; while (!isdigit(c = getcharO))  * ceva *  se va bloca la EOF, pentru ca nu e cifra (nici isalpha, isspace, etc ) while (!isdigit(c = getcharO)) if (c == EOF) break;  * sau ce vrem sa facem la EOF *  else  * restul prelucrarii *  ignora oricate spatii albe: scanf (" "); ignora pana la sfarsit de linie: scanf ("7,*[  n]"); scanf ("7,*l[ n]"); (citeste si ignora (*) oricate caractere diferite (") de  n si un  n Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Pentru a prelucra corect pana la EOF nu e suficienta secventa din stanga deoarece ifeofO la inceputul ciclului nu garanteaza o citire corecta Trebuie testata citirea corecta (getcharO l= EOF, fgetsG ) 1= null, valorea lui (f)scanf), si tratat cazul de eroare (ex iesirea din bucla) while i(feof(fisier)) { for (;;) { citesteO; if (citesteO != CORECT) break; prelucreazaO ; prelucreazaO ; } } indicatorul de sfarsit de fisier e pozitionat doar cand se incearca citirea de sfarsitul fisierului, nu cand s-a citit ultimul caracter - dupa citirea ultimului element, feofO poate fi adevarat sau nu ex pt un fisier de intregi separati prin spatii, feofO e pozitionat dupa citirea ultimului doar daca nu e urmat de altceva (ex spatiu,  n) ex la citirea linie cu linie, feofO e pozitionat dupa citirea ultimei linii doar daca ea nu se termina cu  n - daca feofO e fals, fisierul poate sa mai contina un element sau nu Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 5 Prelucrare de texte si fisiere Aplicatii Sa se tipareasca, pe cate o linie, toate secventele de cifre din intrare O abordare: textul e o repetitie de: grup de cifre, grup de alte caractere => structura: doua cicluri consecutive, intr-un ciclu pana la EOF ’ n’ se tipareste la trecerea intre cele doua (preferabil dupa cifre) se incepe cu grupul de alte caractere (posibil vid) void main(void) int c; do {  * si EOF e lisdigit, atentie la ciclu infinit! *  while (!isdigit(c = getcharO)) if (c == EOF) return;  * aici, c e sigur o cifra; repeta cat timp e cifra *  do put char (c); while (isdigit(c = getcharO)); putchar(’ n’);  * gata cifrele, c e altceva, poate EOF *  } while (c != EOF); } Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Mai simplu: cand vedem o cifra, citim si tiparim cat timp e cifra void main(void)  ( int c; while ((c = getcharO) != EOF) if (isdigit(c)) {  * prima cifra; continua cu restul *  do putchar(c); while (isdigit(c = getcharO)); putchar(’ n’);  * gata grupul de cifre *  }  * daca nu e cifra, nu trebuie facut nimic *  } De considerat, pentru prelucrari pe secvente de caractere: - cum e definita secventa cautata, si ce poate fi intre secvente ? - ce trebuie facut la separarea intre secvente (cicluri in program) ? - care e starea (caracterul curent) inainte si dupa fiecare ciclu ? - EOF poate interveni oricand Se opreste corect programul ? incercati sa priviti problema (si solutia) ca un automat: in orice punct din program, ce poate interveni ? ce trebuie facut ? Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Tipuri definite pentru reprezentarea timpului: clock t si time t (sunt de fapt tipuri aritmetice, de ex unsigned sau unsigned long) clock t clock(void); returneaza timpul scurs de la lansarea programului, in unitati de ceas date de constanta clocks per sec (in standardul POSiX, 1 milion) e o aproximatie dependenta de granularitatea ceasului de timp real poate interveni depasire (pe sistem de 32 de biti, dupa cca 72 min ) time t time(time t *timer); returneaza o valoare aritmetica reprezentand data ora curenta (in UNiX, numarul de secunde trecute de la 1 ian 1970 UTC) daca argumentul pointer e nenul, valoarea e stocata si la acea adresa double difftime(time t timel, time t timeO); returneaza diferenta exprimata in secunde, ca double Programarea calculatoarelor 2 Curs 14 Marius Minea time h defineste si un tip cu componentele unei date ore: struct tm { int tm sec; int tm min; int tm hour; int tm mday; int tm mon; int tm year; int tm wday; int tm yday; int tm isdst;  * seconds *   * minutes *   * hours *   * day of the month *   * month *   * year *   * day of the week *   * day in the year *   * daylight saving time *  time t mktime(struct tm *tm); calculeaza valoarea reprezentata de o astfel de structura; completeaza tm wday si tm yday struct tm *gmtime(const time t *timep); struct tm *localtime(const time t *timep); transforma timpul dat in format structura, considerat timp universal (UTC) sau relativ la zona locala de timp; returneaza pointer spre o zona statica ce va fi suprascrisa cu o noua valoare la urmatorul apel Alte functii pentru conversii in sir: asctimeO, ctimeO — v manualul Programarea calculatoarelor 2 Curs 14 Marius Minea Numerele generate sunt aleatoare (de fapt deterministe, bazate pe un algoritm, dar cu distributie cat mai uniforma) (numere cu adevarat aleatoare ar trebui sa fie bazate pe fenomene fizice, ex aruncarea unei monede sau descompunerea unor particule) int rand(void); returneaza un numar pseudoaleator intre 0 si rand max (min 32767) pt un numar aleator intre 1 si N putem folosi 1 + randO % N void srand(unsigned int seed); reinitializeaza generatorul de numere pseudoaleatoare cu valoarea data urmatorul numar va fi generat de randO pornind de la aceasta valoare fara apelarea ei, doua rulari genereaza acelasi sir de valori cu randO se poate folosi de ex CU srand(unsigned)time(NULL)) ; Programarea calculatoarelor 2 Curs 14 Marius Minea - Programati ! - Scrieti functii nu doar pentru cod repetat (evitati duplicarea de cod!), ci pentru orice unitate logica cu un scop bine definit - Programele sunt mult mai lizibile cand sunt structurate in functii! - Compilati, rulati si testati programele pe masura ce le dezvoltati! Scrieti fragmente care implementeaza incremental functionalitatea ceruta Nu scrieti programe mari dintr-o data, fara pasi intermediari! - Documentati (macar minimal) variabilele esentiale si functiile Programarea calculatoarelor 2 Curs 14 Marius Minea Reprezentarea valorilor - lucrati cu sizeof (octeti!), nu dimensiuni presupuse (2, 4 octeti, etc ) - atentie la semn (" la dreapta; reprezentat diferit la int si reali) - atentie la depasire pt intregi si precizie limitata pentru reali Prelucrari de texte -cititi : nu ° oS, 7oE ], getsO - tratati in orice punct cazul de sfarsit de fisier - nu limitati inutil capacitarea de a trata dimensiuni mari - prelucrati fragmente cat mai mici (caracter cuvant linie), in functie de problema Programarea calculatoarelor 2 Curs 14 Marius Minea siruri si conversii numerice - un sir e definit prin inceput si terminatorul nul ’ 0’ - nu copiati fragmente inutil; e suficient adesea avansul unui pointer - folositi functiile standard pentru conversiile sir numar Pointeri - declarand un pointer NU se aloca loc pentru obiect, doar pt adresa! - nu returnati dintr-o functie adresa unei variabile locale (ex tablou) Fisiere - tratati cazurile de eroare (si pt orice interactiune cu exteriorul) - diferentiati intre numerele stocate in format text (sir de cifre) si in format binar (ca in memorie; cititi cu fread in tip de dimensiune fixa corespunzatoare (ex uint32 t) Programarea calculatoarelor 2 Curs 14 Marius Minea signed char e un (ca si short, int, long, long long) char e signed char (-128 127) sau unsigned char (0 255) (neprecizat) => poate fi folosit (si e convertit) ca un intreg in expresii cifra ^intreg: ’5’ == ’0’ + 5; 7 == ’7’ - ’0’ etc (cifre, litere mari, mici: trei blocuri de caractere in tabela ASCii) Functiile din isalphaO etc returneaza != 0 sau 0, NU 1 sau 0 => scrieti: if (isdigit(c)) si nu if (isdigit(c) == 1) Functiile de clasificare: definite si pentru EOF == -1 (toate false) Atentie! la numere cu semn poate introduce bitul de semn, nu 0 => folositi unsigned pentru efect bine definit (introduce 0) Un caracter (’a’, valoare intreaga) NU e un sir ("a", valoare adresa) => NU putem scrie atoi(’9’); strcat(s, ’b’); etc Programarea calculatoarelor 2 Curs 14 Marius Minea Functiile standard au nevoie de  0 pentru a detecat sfarsitul unui sir La validarea datelor, testati valoarea returnata de scanf La corectare, goliti tamponul de intrare: while (getcharO ! = ’ n’); Declarati caracterul ca int pentru while ((c = getcharO) != EOF) Testati de EOF citire, inainte sau dupa Corect while (scanf ("° od" , &n) == 1)) (nu doar != 0) while (fgets(s, 80, stdin)) Evitati la sfarsit de fisier: while (isspace(c = getcharO)) iese pentru c == EOF (false) while (!isspace(c = getcharO)) se blocheaza la c == EOF (true) Programarea calculatoarelor 2 Curs 14 Marius Minea Orice tablou in C are dimensiune si => nu exista tablouri de dimensiune necunoscuta int tab[]; are element! Cand accesam (ex umplem) un tablou NU avem voie sa depasim dimensiunea alocata — la scanf NU: ° os sau ° 0[A-Z] ci de ex ° 019s NU: gets DA: fgets - la ° os: permitem 1 mai putin decat tabloul (loc pentru  0) - fgets citeste automat cu 1 mai putin decat parametrul (atentie: ° os citeste , fgets citeste - la parcurgere NU: while ((c = getcharO) != EOF) tab[i++] = c; (trebuie verificata depasirea indicelui i) Programarea calculatoarelor 2 Curs 14 Marius Minea O declaratie de pointer: tip *ptr; spune: un obiect (sau tablou) de tipul tip, dar inca , pentru el => nu-l putem folosi inainte de a-i atribui o zona de memorie ! (adresa unei variabile existente, sau zona alocata dinamic) - : cand cunoastem dinainte dimensiunea char s ; NU ne complicam: char *s; s = malloc(80); if (! s) - : cand stim dimensiunea in momentul apelului, printf ("Cate numere"); scanf ("° od", &n"); tab=malloc (n*sizeof (int) ) ; l=strlen(s); if (p=malloc(1+1)) strcpy(p, s); else - : cand initial nu am alocat cat trebuie folosim pointerul returnat (poate muta memoria) Programarea calculatoarelor 2 Curs 14 Marius Minea unui tablou e sa de inceput (o !) => numele unui tablou (inel, sir de caractere) e un (constant) => tablou [indice] sau pointer [indice] e acelasi lucru => char a , b ; a = b; NU copiaza tablouri, ci atribuie adrese ! (si da eroare de compilare, pentru ca a e constanta !) sl==s2 compara pointerii (se suprapun?), nu continutul: strcmp(sl, s2) => NU are sens sa scriem void f(char s ) scriem: void f(char tab[]) sau void f(char *tab) (NU se transmit 20 de caractere, se transmite adresa tabloului) char tab[num][len]; (daca cunoastem lungimea maxima a sirului) char *tab[NUM]; fiecare element (adresa) trebuie atribuit ( Programarea calculatoarelor 2 Curs 14 Marius Minea Orice parametru transmis trebuie sa aiba o valoare valida, utilizabila ! => un pointer transmis trebuie sa indice o zona de memorie valida! - zona respectiva e folosita la citire sau scriere, depinzand de functie NU: char *p; strcpy(p, "un sir"); p neinitializat nealocat ! NU: char **endptr; l=strtol(sir, endptr, 10); endptr e nealocat! DA: char *endptr; l=strtol(sir, feendptr, 10); scrie valoare la feendptr O functie nu poate intoarce adresa unei variabile (ex tablou) - e alocata pe stiva => va odata cu iesirea din corpul functiei => un pointer returnat de o functie provine din a) un parametru; b) o variabila globala (problematic: suprascriere); c) alocare dinamica Un pointer returnat de o functie trebuie sa fie sau Programarea calculatoarelor 2 Curs 14 Marius Minea 18 ianuarie 2004 Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 2 O functie se cu antetul urmat de corpul sau: tip rezultat nume functie ( lista parametri ) {  * declaratii si instructiuni din corpul functiei *  } unde lista de parametri e fie void (daca nu sunt parametri), fie tip pari nurne par^ , tippar2 numepar2 , tip parn nume parn O de functie e doar antetul urmat de ; - daca vrem sa folosim (apelam) functia fara ca ea sa fie inca definita - daca e definita altundeva (ex biblioteca) - declaratii din fisiere h int abs(int n); int getchar(void); void exit(int status); double pow(double x, double y) ; int main(int argc, char *argv[]); Declaratia incepe cu void doar daca functia nu returneaza ! O functie se apeleaza similar ca si in matematica: nume(argumente) Argumentele pot fi (inel, variabile sau constante) printf ("° od" , abs (x+2) ) ; c = getchar(); exit(l); z = 2 + pow(x-3, 5); Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 3 in limbajul C, Adica: -inainte de apel se calculeaza valoarea expresilor date ca argument - la inceputul executiei functiei, fiecare parametru din antet primeste valoarea argumentului corespunzator - parametrii din declaratia functiei se comporta ca si variabile locale: nu sunt vizibili dupa iesirea din functie; valorile void f(int x) { x = 5; } void main(void) { int у = 3; f(y); } in interiorul lui f, x se modifica din 3 in 5 Dar у nu se schimba !!! Daca apelam f (4), nu putem sa-l facem pe 4 egal cu 5 !!! Daca apelam f(y*y + 2), programul NU rezolva ecuatia y2 + 2 = 5 !!! Pentru a returna o valoare folosim instructiunea return expresie ; Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 4 La inceputul functiei, fiecare parametru are o valoare : valoarea expresiei transmisa ca argument E gresit sa scriem: void f(int x) { printf("introduceti valoarea lui x");    x e deja cunoscut !!! scanf ("° od", &x) ;    daca x e dat, de ce vrem sa citim altul ??? Cand apelam f(4) inseamna ca vrem sa lucram cu 4, nu sa-l citim!!! Daca functia citeste x, nu-l are parametru (nu poate sa-l modifice!) NU: void citeste(int x); De ce ii functiei valoarea lui x ca argument cand noi o valoare ? Ce inseamna citeste(3) ?? Functia trebuie sa ia ca parametru o de intreg, pentru a scrie rezultatul la acea adresa: void citestednt *px) { scanf ("° od", px); } sau valoarea (fara parametri, o variabila locala pt rezultat) int citeste (void) { int x; scanf ("° od", &x) ; return x; } Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 5 O declaratie de pointer: tip *ptr; spune: un obiect (sau tablou) de tipul tip, dar inca , pentru el => nu-l putem folosi inainte de a-i atribui o zona de memorie ! (adresa unei variabile existente, sau zona alocata dinamic) cand cunoastem dinainte dimensiunea char s ; NU ne complicam: char *s; s = malloc(80); if (! s) cand stim dimensiunea in momentul apelului printf("Cate l=strlen(s); numere"); scanf ("° od", &n"); tab=malloc (n*sizeof (int) ) ; if (p=malloc(1+1)) strcpy(p, s); else cand initial nu am alocat cat trebuie folosim pointerul returnat (poate muta memoria) Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 6 unui tablou e sa de inceput (o !) => numele unui tablou (inel, sir de caractere) e un (constant) => tablou [indice] sau pointer [indice] e acelasi lucru => char a , b ; a = b; NU copiaza tablouri, ci atribuie adrese ! (si da eroare de compilare, pentru ca a e constanta !) sl==s2 compara pointerii (se suprapun?), nu continutul: strcmp(sl, s2) => NU are sens sa scriem void f(char s ) scriem: void f(char tab[]) sau void f(char *tab) (NU se transmit 20 de caractere, se transmite adresa tabloului) char tab[num] [len] ; (daca cunoastem lungimea maxima a sirului) char *tab[NUM]; fiecare element (adresa) trebuie atribuit ( Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Recapitulare Erori frecvente Exercitii 7 Orice parametru transmis trebuie sa aiba o valoare valida, utilizabila ! => un pointer transmis trebuie sa indice o zona de memorie valida! - zona respectiva e folosita la citire sau scriere, depinzand de functie NU: char *p; strcpy(p, "un sir"); p neinitializat nealocat ! NU: char **endptr; l=strtol(sir, endptr, 10); endptr e nealocat! DA: char *endptr; l=strtol(sir, feendptr, 10); scrie valoare la feendptr O functie nu poate intoarce adresa unei variabile (ex tablou) - e alocata pe stiva => va odata cu iesirea din corpul functiei => un pointer returnat de o functie provine din a) un parametru; b) o variabila globala (problematic: suprascriere); c) alocare dinamica Un pointer returnat de o functie trebuie sa fie sau Utilizarea si programarea calculatoarelor Curs 14 Marius Minea Marius Minea marius@cs upt ro http:  www cs upt ro  marius curs lsd  9 ianuarie 2017 Se poate scrie un antivirus perfect ? (detecteaza toti virusii, nimic altceva) Se poate scrie compilatorul care optimizeaza cel mai bine ? Se poate crea o inteligenta artificiala care sa produca alta si mai inteligenta ? = clasa problemelor care pot fi rezolvate in timp polinomial (relativ la dimensiunea problemei) = clasa problemelor pentru care o solutie poate fi in timp polinomial (a verifica e mai usor decat a gasi) Probleme : cele mai dificile probleme din clasa daca s-ar rezolva in timp polinomial, orice alta problema din NP s-ar rezolva in timp polinomial => ar fi P = NP (se crede ) Realizabilitatea (SAT) e prima problema demonstrata a fi (Cook, 1971) Sunt multe altele (21 probleme clasice: Karp 1972) Cum demonstram ca o problema e NP-completa (grea) ? o problema cunoscuta din NP la problema studiata => daca s-ar putea rezolva in timp polinomial problema noua, atunci ar lua timp polinomial problema cunoscuta Una din cele mai fundamentale probleme in informatica Se crede ca NP, dar nu s-a putut (inca) demonstra imagine: http:  en wikipedia org wiki File:P np np-complete np-hard svg Spre deosebire de logica prepozitionala, in logica predicatelor, numarul interpretarilor e nu mai putem construi exhaustiv tabelul de adevar E deci sa putem o formula (pornind de la axiome si regiuli de inferenta) (demonstratia) se face pur sintactic e o notiune semantica, considerand si valori de adevar (consecinta semantica): Fie H o multime de formule si ip o formula Spunem ca H implica p (H |= e o de functie ei 62 (functia ei aplicata argumentului 62) la fel in ML: f 3 fara paranteze asociativa la stanga: f x у = (f x) у Toate notiunile fundamentale (numere naturale, booleni, perechi, etc ) pot fi exprimate in lambda-calcul Masina Turing e compusa din: o cu un numar infinit de ; fiecare contine un (banda poate fi infinita la unul ambele capete, e echivalent) un de citire scriere, controlat de un banda       i i i i i i i i 1^       cap citire scriere automat Automatul si continutul benzii determina comportarea Dupa 1) starea curenta si 2) simbolul aflat sub cap, masina: 1) trece in starea urmatoare, 2) scrie un (alt) simbol sub cap 3) muta capul la stanga sau la dreapta initial, banda are un sir finit de simboluri, capul e pe cel din stanga; restul celulelor contin un simbol special (numit vid sau blanc) cati a sunt pe banda? obtine fiecare bit din numarul de a —schimba a cu x din doi in doi : scrie 0 sau 1 dupa paritate repeta pana nu mai sunt a: Halt bbbb aaaaab —> bbbbxaxaxa bbb xaxaxab —> bbbOxxxaxx bb Oxxxaxxb —> bblOxxxxxx b lOxxxxxxb —> bllOxxxxxx Halt Formal, masina Turing se descrie printr-un tuplu cu 7 elemente: Q- multimea starilor automatului finit (de control) E: multimea finita a (din sirul initial) Г: multimea simbolurilor de pe banda; E с Г i> : Q x Г —> Q x Г x { , r} : functia de tranzitie: da starea urmatoare, simbolul cu care e inlocuit cel curent, si mutarea la stanga sau dreapta (in unele versiuni, echivalente, capul poate si ramane pe loc) qo G Q: starea initiala a automatului de control b G Г   E: simbolul vid (blanc): toate celulele cu exceptia unui numar finit sunt initial vide F C Q: multimea starilor finale, automatul se opreste (halt) Poate descrie (implementabil prin program) Nu exista algoritm care sa decida pentru orice automat si intrare daca se opreste ( ) - la fel pentru programe in formularea pentru programe: Nu exista algoritm (program) care ia un program arbitrar P si un set de date D si determina daca P(D) (rularea lui P cu datele D) s-ar termina (opri) sau ar rula la infinit Presupunem ca ar exista un astfel de program Deci, CheckHalt(X, X) spune ce face prog X cu textul sau ca date Construim un "program imposibil" care face opusul a ceea ce face! intai, definim programul avand ca intrare un program X: daca CheckHalt(X, X) decide , atunci cicleaza la infinit daca CheckHalt(X, X) decide , atunci stop Deci CheckHalt(X, X) spune ce face X(X) iar Test(X) face opusul Se opreste Test(Test)? Raspunsul e dat de C 7ec c 7a t(Test,Test) dar Test(Test) (cu X=7est) face opusul lui C ?ec  , deci nu poate exista CheckHalti Prelucrare de texte si fisiere Aplicatii 1 - un (orice pana la spatiu alb) char s ; scanf ("° 079s" , s) ; spatiu alb = spatiu sau  f  n  r  t  v ignora spatii albe initiale; adauga ’ 0’ la sfarsit Nu se poate citi o linie de text in acest fel ! din Ana are mere va citi doar primul cuvant: Ana - o , pana la ’ n’ char s ; fgets(s, 80, stdin); citeste max 80-1 caractere, inclusiv Лп’, adauga ’ 0’ la sfarsit stdin: identificator definit in stdio h pt fisierul standard de intrare Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 2 1 in orice punct de program: int feof (file *fp) returneaza nenul (adevarat) daca s-a atins sfarsitul lui fp; 0 daca nu pentru fisierul de intrare: feof (stdin) 2 Dupa valoarea returnata de functiile de intrare: int с; c = getcharO; if (c == EOF)  * *  c trebuie declarat int pentru a testa de EOF valoarea EOF (-1) e diferita de cea a oricarui caracter (o 255) scanf returneaza EOF (-1) daca intalneste imediat sfarsitul de fisier (nu si daca a reusit sa citesca macar ceva) => Folositi doar if (scanf ( ) == nr variabile dorite) pentru a testa citire corecta, nu doar if ((scanf ( )) (pentru ca si EOF e nenul) fgets returneaza null daca fisierul se termina inainte de a citi ceva Exemplu: prelucrarea unui fisier linie cu linie char lin ; while (fgets(lin, 128, stdin))  * prelucreaza lin *  Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 3 : Functiile din ctype h returneaza pentru un caracter de felul dorit si 0 in caz contrar ( neaparat 1 si 0) nu scrieti niciodata if (isalpha(c) == 1) ci doar if (isalpha(c)) la cicluri infinite pentru sfarsit de fisier: int c; while (isdigit(c = getcharO))  * ceva *  va iesi din ciclu cand c nu e cifra, inclusiv la EOF (nu e cifra) int c; while (!isdigit(c = getcharO))  * ceva *  se va bloca la EOF, pentru ca nu e cifra (nici isalpha, isspace, etc ) while (!isdigit(c = getcharO)) if (c == EOF) break;  * sau ce vrem sa facem la EOF *  else  * restul prelucrarii *  ignora oricate spatii albe: scanf(" "); ignora pana la sfarsit de linie: scanf ("7o*l? n]"); scanf ("° 0*l [ n]"); (citeste si ignora (*) oricate caractere diferite ( ) de  n si un  n Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 4 Pentru a prelucra corect pana la EOF nu e suficienta secventa din stanga deoarece ifeof () la inceputul ciclului nu garanteaza o citire corecta Trebuie testata citirea corecta (getcharO ! = EOF, fgets( ) != NULL, valorea lui (f)scanf), si tratat cazul de eroare (ex iesirea din bucla) while ! (feof(fisier)) { for (;;) { citesteQ; if (citesteO != CORECT) break; prelucreazaO ; prelucreazaO ; indicatorul de sfarsit de fisier e pozitionat doar cand se incearca citirea de sfarsitul fisierului, nu cand s-a citit ultimul caracter - dupa citirea ultimului element, feof () poate fi adevarat sau nu ex pt un fisier de intregi separati prin spatii, feof () e pozitionat dupa citirea ultimului doar daca nu e urmat de altceva (ex spatiu,  n) ex la citirea linie cu linie, feof () e pozitionat dupa citirea ultimei linii doar daca ea nu se termina cu  n - daca feof () e fals, fisierul poate sa mai contina un element sau nu Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 5 Sa se tipareasca, pe cate o linie, toate secventele de cifre din intrare O abordare: textul e o repetitie de: grup de cifre, grup de alte caractere => structura: doua cicluri consecutive, intr-un ciclu pana la EOF ’ n’ se tipareste la trecerea intre cele doua (preferabil dupa cifre) se incepe cu grupul de alte caractere (posibil vid) void main(void) int c; do {  * si EOF e iisdigit, atentie la ciclu infinit! *  while (!isdigit(c = getcharO)) if (c == EOF) return;  * aici, c e sigur o cifra; repeta cat timp e cifra *  do putchar (c) ; while (isdigit(c = getcharO)); putcharO n’);  * gata cifrele, c e altceva, poate EOF *  } while (c != EOF); Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea Prelucrare de texte si fisiere Aplicatii 6 Mai simplu: cand vedem o cifra, citim si tiparim cat timp e cifra void main(void) { int c; while ((c = getcharO) != EOF) if (isdigit(c)) {  * prima cifra; continua cu restul *  do putchar(c) ; while (isdigit(c = getcharO)); putchar(’ n’);  * gata grupul de cifre *  }  * daca nu e cifra, nu trebuie facut nimic *  De considerat, pentru prelucrari pe secvente de caractere: - cum e definita secventa cautata, si ce poate fi intre secvente ? - ce trebuie facut la separarea intre secvente (cicluri in program) ? - care e starea (caracterul curent) inainte si dupa fiecare ciclu ? - EOF poate interveni oricand Se opreste corect programul ? incercati sa priviti problema (si solutia) ca un automat: in orice punct din program, ce poate interveni ? ce trebuie facut ? Utilizarea si programarea calculatoarelor 2 Curs 2 Marius Minea 1 martie 2005 Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 2 Recursivitatea e un concept fundamental in matematica si informatica Un obiect (o notiune) e recursiv(a) daca e folosit in propria sa definitie Exemplu din matematica: siruri recurente - progresie aritmetica: xq = a, xn = жп і + p, pentru n > 0 - sirul lui Fibonacci: Fq = 1, = 1, Fn = Fn ^ + Fn ? pentru n > 1 E inrudita cu iteratia: ambele implica repetitie, dar in mod diferit: Ex : descrierea unui obiect compus (un sir) - iterativ: un e un , urmat de alt , urmat ( — recursiv: un e un , sau un urmat de un (notiunea definita (sir) apare din nou in corpul definitiei) Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 3 Matematic: t f 1 pt n = O n' | n • (n — 1)! pt n > O unsigned fact(unsigned n) if (n == 0) return 1; else return n * fact(n-l); sau: unsigned fact(unsigned n) { unsigned fact nonrec(unsigned n) int p = 1; while (n > 0) { p = p * n; n = n - 1; return p; return n ? n * fact(n-l) : 1; } => transcriere practic directa din formularea matematica valoarea factorialului se acumuleaza automat in expresia returnata (in varianta nerecursiva, e necesara o variabila suplimentara p) Obs : am ales unsigned n; pentru int trebuie tratat cazul 0 (жп рі nu e inca definit) => orice sir de apeluri de functii recursive trebuie sa se opreasca (nu va genera un calcul infinit) in general, distingem (asemanator ca si la inductia matematica) - un (pentru care notiunea e definita direct) (ex a° = 1) - un (recursivitatea propriu-zisa) (ex ап+г = an * a) (notiunea e definita folosind aceeasi notiune, dar pe un caz mai simplu) Cum ne asiguram de oprirea recursivitatii (la cazul de baza)? - daca avem un indice explicit (ex la siruri): cand definitia pentru n + 1 se foloseste doar de valorile pt indici nr de apeluri (cate pt fib(5)?^ Utilizarea si programarea calculatoarelor Curs 15 7 unsigned fib nonrec(unsigned n) { int i, *f, res; if (n deja calculat *  else return f[n] = fib memo(n-1) + fib memo(n-2);  * memoram in f[n] inainte de a returna valoarea *  Cate apeluri se efectueaza pentru fib memo(5) ? Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 9 Adesea, cazul de baza e f simplu (ex test si returnarea unei valori) in comparatie, costul unui apel de functie poate fi semnificativ => Putem trata cazul de baza fara a mai face un apel suplimentar #define MAX 45 unsigned f [MAX+1] = {1, 1};  * restul zero *  unsigned fib r(unsigned n)  * pentru n >= 2 *  return f[n]=(f[n-1]?f[n-1]:fib r(n-1))+(f[n-2]?f[n-2]:fib r(n-2))  * testam f[k] pt a decide daca sa apelam unsigned fib main(unsigned n)  * se apeleaza if (n MAX) return UiNT MAX;  * prea else return fib r(n); recursiv sau nu *  de utilizator *  mare *  Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 10 Un sir privit recursiv: sir vid sau caracter urmat de un sir void puts(char *s) if (*s) { putchar(*s); puts(s+l); }  * un caracter + restul *  else putchar (’ n’);  * terminam afisarea cu linie noua *  Citirea unei linii de text intr-o zona de memorie alocata dinamic char *getline(int n) {  * apelam initial cu getline(O) *  char c, *s; if ((c = getcharO) == ’ n’) { if (!(s = malloc(n+2))) return NULL; s[n+l] = ’ 0’; } else s = getline(n+1); if (s) s [n] = c; return s; Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 11 inversare recursiva: invers(caracter + rest) = invers(rest) + caracter, (pasul de prelucrare e apelul recursiv); invers(sir vid) = sir vid; void putrev(char *s) if (*s) { putrev(s+l); putrev(*s); }  * else nimic *  }  * tiparim  n separat dupa apel *  Obs: apelam cu s+1, dar pastram s nemodificat in functie, NU s++ ! Citirea unei linii de text si afisare in ordine inversa: void readrev(void) { char c;  * cate o variabila distincta pentru fiecare apel *  if ((c = getcharO) != ’ n’) readrevQ ;  * continua *  putchar(c);  * la revenire, tipareste caracterul memorat *  Obs: avem o instanta diferita a lui c (caracterul curent) la fiecare apel! Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 12 ideea: la aceeasi problema cu elemente mai putine (cu unul) Caz de baza: minimul unui tablou de un element e acel element Recursiv: minimul dintre primul element si minimul celor urmatoare double min rec(double tab[], unsigned len) if (len == 1) return *tab;  * unicul element *  else { double mini = min rec(tab + 1, len - 1); return *tab 9) dec print(n 10); putchar(n % 10 + ’0’); Tiparirea cifrei e apelul recursiv, ca in descrierea in cuvinte Similar, tiparirea pe biti a unui numar: void bit print(unsigned n) Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 14 f crescatoare cu f(a) 0 are radacina in [a, b] - cautam pe care jumatate a intervalului schimba semnul si continuam - oprire: cand se atinge precizia dorita #define EPS 0 001 double f(double x) { return exp(x) - sin(x) - 1 5; }  * f(0) 0 *  double root(double a, double b) double m = (a+b) 2, z; if (b - a 0 *  else return root(a, m);  * f(a) 0) { p = p * n; return fact prod(n-l, p); } else return p;  * apelat cu fact prod(n, 1) *  int fact nonrec(int n) int p = 1; while (n > 0) { p = p * n; n = n - 1; return p; Pentru mai mult de un apel recursiv, e necesara folosirea unei stive Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 18 #include void main(void) unsigned m, lo = 0, hi = (1 " 10) - 1; printf("Ganditi-va la un numar intreg intre 0 si ° od n", hi); do { m = (lo + hi) " 1; printf ("Numarul e mai mare decat ° od ? (d n) ", m) ; if (tolower (getcharO) == ’d’) lo = m+1; else hi = m; while (getcharO != ’ n’); } while (lo a[m]) l=m+l; else r=m; int m = (l+r) 2; if (v>a[m]) return bsrch(v, a, m+1, r); else if (v==a[l]) return 1; else return -1; return bsrch(v, a, 1, m); } else if (v==a[l]) return 1; else return -1; void *bsearch(const void *key, const void *base, size t nmemb, size t size, int (*compar)(const void *, const void *)); Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate Recursivitate Matematic: Recursivitatea e un concept fundamental in matematica si informatica Un obiect (o notiune) e recursiv(a) daca e folosit in propria sa definitie Exemplu din matematica: siruri recurente - progresie aritmetica: xq = a, xn = + p, pentru n > 0 - sirul lui Fibonacci: Fo = 1, Fj = 1, Fn = F" i + Fn 2 pentru n > 1 E inrudita cu iteratia: ambele implica repetitie, dar in mod diferit: Ex : descrierea unui obiect compus (un sir) -iterativ: un e un , urmat de alt , urmat ( ) - recursiv: un e un , sau un urmat de un (notiunea definita (sir) apare din nou in corpul definitiei) f 1 pt n = 0 | n • (n — 1)! pt n > 0 unsigned fact(unsigned n) if (n == 0) return 1; else return n * fact(n-l); unsigned fact nonrec(unsigned n) int p = 1; while (n > 0) { p = p * n; n = n - 1; return p; sau: unsigned fact(unsigned n) { return n ? n * fact(n-1) : 1; } => transcriere practic directa din formularea matematica valoarea factorialului se acumuleaza automat in expresia returnata (in varianta nerecursiva, e necesara o variabila suplimentara p) Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Obs : am ales unsigned n; pentru int trebuie tratat cazul 0 nu e inca definit) => orice sir de apeluri de functii recursive trebuie sa se opreasca (nu va genera un calcul infinit) in general, distingem (asemanator ca si la inductia matematica) - un (pentru care notiunea e definita direct) (ex a° = 1) -un (recursivitatea propriu-zisa) (ex ! 1 = an * a) (notiunea e definita folosind aceeasi notiune, dar pe un caz mai simplu) Cum ne asiguram de oprirea recursivitatii (la cazul de baza)? - daca avem un indice explicit (ex la siruri): cand definitia pentru n+lse foloseste doar de valorile pt indici nr de apeluri (cate pt fib(5)?) e exponential in n (f ineficient) Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate Recursivitate 8 - prin memorarea valorilor intermediare necesare - calcul ordonat ca rezultatele intermediare sa fie disponibile cand sunt necesare (rezolvare de Jos in sus, f ib nonrec) - calculul valorilor dupa cum devin necesare (de sus in Jos) #define MAX 45 unsigned f[MaX] = {1, 1};  * restul zero *  unsigned fib memo (unsigned n)  * doar pt n deja calculat *  else return f [n] = f ib memo(n-l) + f ib memo(n-2) ;  * memoram in f[n] inainte de a returna valoarea *  } Cate apeluri se efectueaza pentru fib memo(5) ? Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Adesea, cazul de baza e f simplu (ex test si returnarea unei valori) in comparatie, costul unui apel de functie poate fi semnificativ => Putem trata cazul de baza fara a mai face un apel suplimentar #define MAX 45 unsigned f [MAX+1] = {1, 1};  * restul zero *  unsigned fib r(unsigned n)  * pentru n >= 2 *  return f [n]=(f [n-l]?f [n-1] :fib r(n-l))+(f [n-2] ?f [n-2] :fib r(n-2));  * testam f[k] pt a decide daca sa apelam recursiv sau nu *  unsigned fib main(unsigned n)  * se apeleaza de utilizator *  if (n MAX) return UiNT MAX;  * prea mare *  else return fib r(n); Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 9 Recursivitate 10 Un sir privit recursiv: sir vid sau caracter urmat de un sir void puts(char *s) if (*s) { putchar(*s); puts(s+l); }  * un caracter + restul *  else putchar (AnO;  * terminam afisarea cu linie noua *  Citirea unei linii de textintr-o zona de memorie alocata dinamic char *getline(int n) {  * apelam initial cu getline(O) *  char c, *s; if ((c = getcharO) == An O { if (!(s = malloc(n+2))) return NULL; s[n+l] = АО*; }• else s = getline(n+l); if (s) s[n] = c; return s; Utilizarea si programarea calculatoarelor Curs 15 Marius Minea inversare recursiva: invers(caracter 4- rest) = invers(rest) + caracter, (pasul de prelucrare e apelul recursiv); invers(sir vid) = sir vid; void putrev(char *s) if (*s) { putrev(s+l); putrev(*s); }  * else nimic *  }•  * tiparim  n separat dupa apel *  Obs: apelam cu s+1, dar pastram s nemodificat in functie, NU s++ ! Citirea unei linii de text si afisare in ordine inversa: void readrev(void) •  char c;  * cate o variabila distincta pentru fiecare apel *  if ((c = getcharO) != ’Xn’) readrevO ;  * continua *  putchar(c);  * la revenire, tipareste caracterul memorat *  Obs: avem o instanta diferita a lui c (caracterul curent) la fiecare apel! Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 11 Recursivitate 12 ideea: la aceeasi problema cu elemente mai putine (cu unul) Caz de baza: minimul unui tablou de un element e acel element Recursiv: minimul dintre primul element si minimul celor urmatoare double min rec(double tab [], unsigned len) { if (len == 1) return *tab;  * unicul element *  else { double mini = min rec(tab + 1, len - 1); return *tab 9) dec print(n 10); putchar(n 7 10 + ’0’); } Tiparirea cifrei e apelul recursiv, ca in descrierea in cuvinte Similar, tiparirea pe biti a unui numar: void bit print(unsigned n) { if (n " 1) bit print(n " 1); putchar(n & 1 ? ’l’ : ’O’); } Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate 13 Recursivitate   crescatoare cu  (a) 0 are radacina in [a, b] -cautam pe care jumatate a intervalului schimba semnul si continuam - oprire: cand se atinge precizia dorita #define EPS 0 001 double f(double x) { return exp(x) - sin(x) - 1 5; }  * f(0) o *  double root(double a, double b) double m = (a+b) 2, z; if (b - a 0 *  else return root(a, m);  * f(a) 0)    p = p * n; return fact prod(n-l, p); }• else return p;  * apelat cu fact prod(n, 1) *  int fact nonrec(int n) int p = 1; while (n > 0) { p = p * n; n = n - 1; return p; Pentru mai mult de un apel recursiv, e necesara folosirea unei stive Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Utilizarea si programarea calculatoarelor Curs 15 Marius Minea Recursivitate Recursivitate 18 #include void main(void) unsigned m, lo = 0, hi = (1 " 10) - 1; printf("Ganditi-va la un numar intreg intre 0 si %d n", hi); do { m = (lo + hi) " 1; printf("Numarul e mai mare decat %d ? (d n) ", m); if (tolower(getcharO) == MD lo = m+1; else hi = m; while (getcharO != AnD; } while (lo a Em]) l=m+l; else r=m; if (v==aEH) return 1; else return -1; int a[N]; returnam indicele invariant: lo aEm]) return bsrch(v, a, m+1, r); else return bsrch(v, a, 1, m); }• else if (v==aEH) return 1; else return -1; const void *base, size t nmemb, const void *)); Marius Minea Tipuri de date abstracte Liste 23 martie 2004 Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea : o secventa cu inserare si extragere la un singur capat => elementul extras (pentru prelucrare) e intotdeauna ultimul introdus - principiul stivei: folosit la implementarea apelului de functii (pentru salvarea adresei de revenire in program si a variabilelor locale) => poate fi folosita pentru simularea recursivitatii - Exemplu: calculator de buzunar pentru expresii postfix (operatorul urmeaza dupa operanzi; nu sunt necesare paranteze) : o secventa cu inserare la un capat si extragere la altul => elementul extras e intotdeauna cel mai vechi introdus => folosita pentru prelucrarea secventiala (in care pasul de prelucrare poate produce la randul lui noi elemente de prelucrat) - Exemplu: regiunea accesibila dintr-un punctintr-un plan cu obstacole Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 3 Tipuri de date abstracte Liste Lista = o insiruire de elemente care se poate parcurge secvential, si in care se pot insera elemente in pozitia dorita Def recursiva: o lista este fie vida, fie un element urmat de o lista element: tipul de date stocat in lista (informatia utila) pozitie: tip care identifica locatia unui elementin lista (posibil: pointer) init(lista)  * creaza lista vida *  empty(lista): boolean  * lista este vida ? *  first(lista) : pozitie  * returneaza prima pozitie din lista *  next(pozitie) : pozitie  * urmatoarea pozitie; lista e implicita *  lookup(lista, element) : pozitie  * cauta elementul in lista *  insertfirst(lista, element)  * insereaza la inceput *  insertafter(pozitie, element)  * insereaza dupa pozitie *  delete(lista, pozitie)  * sterge pozitia din lista *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea : o structura de date pentru un sir de elemente identice - ofera nu numai secventiere, dar si acces direct - implementeaza o functie de la multimea indicilor la cea de elemente => fiecare element poate fi identificat printr-un intreg (pozitia in tablou) O lista s-ar putea implementa cu un tablou Dar apar si dezavantaje: - daca lista creste, tabloul poate fi prea mic (=> realocat dinamic) - stergerea unui element din tablou implica - fie mutarea celorlalte elemente (schimba corespondenta intre indici si element; e costisitoare daca trebuie pastrata ordinea) - fie marcarea elementelor sterse (cu un camp fanion suplimentar) (parcurgerea devine ineficienta daca sunt multe elemente sterse) : tip similar cu lista, dar fara ordonare si fara duplicate => implementabila ca lista cu insertfirst, delete si test de membru Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste Tipuri de date abstracte Liste typedef int elem t;  * sau alt tip *  typedef struct n { elem t e;  * informatia utila *  struct n *next;  * pointer la elementul urmator *  } node t;  * nume echivalent cu struct n *  typedef node t *list t;  * tipurile lista si pozitie sunt acelasi, adica un pointer la node t *  2  | Fjt 3 NULL void init(list t *pl) { *pl = NULL; }•  * modifica valoarea listei, deci trebuie pointer la lista ca parametru *  int eropty(list t 1) { return 1 != NULL; }• node t *first(list t 1) { return 1; } node t *next(node t *n) { return n->next; } int insertfirst(list t *pl, elem t e) { node t *p; if (!(p = malloc(sizeof(struct n)))) return 0;  * eroare *  p->e = e; p->next = *pl; *pl = p; return 1;  * succes *  }  * modifica capul listei, deci are parametru pointer la lista *  int insertafter(node t *n, elem t e) { node t *p; if (!(p = malloc(sizeof(node t)))) return 0;  * eroare *  p->e = e; p->next = n->next; n->next = p; return 1;  * succes *  У  * noul nod p e inserat dupa vechiul nod n *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste Tipuri de date abstracte Liste node t *lookup(list t 1, elem t e) for ( ; 1 ! = NULL; 1 = l->next)  * cauta pana la sfarsit *  if (e == l->e) return 1;  * s-a gasit, returneaza pozitia *  return NULL;  * returneaza NULL daca nu s-a gasit *  node t *lookup(list t 1, elem t e) if (!1 ii e == l->e) return 1;  * gasit sau sfarsit *  return lookup(l->next, e);  * cauta incepand cu urmatorul *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea  * parametrul 2 e nodul de sters, eventual gasit cu lookup *  int delete(list t *pl, node t *n) { * poate schimba capul listei *  node t p = *pl;  * deci ia parametru pointer la lista *  if (!p ii in) return 0;  * lista sau nod vid, eroare *  if (p != n) {  * n nu e primul nod din lista *  while (p && p->next ! = n) p = p->next;  * cauta predecesorul *  if (!p) return 0;  * nu s-a gasit n, eroare *  p->next = n->next;  * ’sare’ peste nodul n *  }• else *pl = n->next;  * n e primul, schimba capul listei *  free(n); return 1;  * elibereaza n, returneaza succes *  int delete (list t *pl, node t *n)  * presupune n nenul *  if (!*pl) return 0;  * lista vida, deci nu s-a gasit *  if (*pl == n) { *pl = n->next; free(n); return 1; }  * sterge *  else return delete (&(*pl)->next, n);  * incearca mai departe *  }•  * apelata din nou cu *adresa* pointerului la urmatorul nod *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste Tipuri de date abstracte Liste 10 list t reverselist(list t head) {  * varianta iterativa *  node t *nxt, *rev = NULL;  * nxt=urm nod, rev=lista inversata *  while (head) {  * leaga urmatorul elem la rev *  nxt = head->next; head->next = rev; rev = head; head = nxt;  * avanseaza nxt in lista *  return rev; list t reverselist(list t rev, list t rest) if (irest) return rev; else { node t *nxt = rest->next; rest->next = rev; return reverselist(rest, nxt); }  * la inceput apelam reverselist(NULL, head); *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Graf: o colectie de noduri si muchii care leaga doua noduri Exemplu: o multime de localitati cu drumurile intre ele Un nod poate fi legat cu oricat de multe alte noduri => folosim lista typedef struct n { int id;  * un numar pt identificare *   * alte informatii despre nod *  struct e *edges;  * lista de muchii *  } node t; typedef struct e { struct n *dest;  * celalalt capat al muchiei *  struct e *next;  * pointer spre urmatoarea muchie *  } edge t; Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste Tipuri de date abstracte Liste in listele prezentate pana acum: - parcurgerea se poate face intr-un singur sens (nu si inapoi) - stergerea necesita parcurgerea listei (chiar dat fiind nodul de sters) pentru a gasi nodul precedent care contine legatura spre nodul dat Solutie: se retin legaturi (pointeri) spre vecinii in ambele sensuri: typedef struct n { elem t info;  * informatia utila din nod *  struct n *prev, *next;  * pointeri spre cei vecini *  } node t; Capul listei are predecesor nul, coada listei are succesor nul Varianta: lista circulara, legand capul si coada listei int insertafter(node t *p, elem t e) { node t *n = raalloc(sizeof(node t)); if (!n) return 0;  * eroare, memorie insuficienta *  n->info = e; n->prev = p; n->next = p->next;  * leaga nodul n la vecini *  p->next->prev = n; p->next = n;  * leaga vecinii la nodul n *  return 1;  * succes *  void delete(list t *pl, node t *p) {  * trebuie modificate cel mult doua legaturi *  if (p->next) p->next->prev=p->prev;  * daca p nu e ultimul *  if (p->prev) p->prev->next=p->next;  * daca p nu e primul *  else *pl = p->next;  * p era primul, schimba capul listei *  free(p);  * elibereaza memoria pentru nodul sters *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea 23 martie 2004 Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 2 : o secventa cu inserare si extragere la un singur capat => elementul extras (pentru prelucrare) e intotdeauna ultimul introdus - principiul stivei: folosit la implementarea apelului de functii (pentru salvarea adresei de revenire in program si a variabilelor locale) => poate fi folosita pentru simularea recursivitatii - Exemplu: calculator de buzunar pentru expresii postfix (operatorul urmeaza dupa operanzi; nu sunt necesare paranteze) : o secventa cu inserare la un capat si extragere la altul => elementul extras e intotdeauna cel mai vechi introdus => folosita pentru prelucrarea secventiala (in care pasul de prelucrare poate produce la randul lui noi elemente de prelucrat) - Exemplu: regiunea accesibila dintr-un punctintr-un plan cu obstacole Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 3 Lista = o insiruire de elemente care se poate parcurge secvential, si in care se pot insera elemente in pozitia dorita Def recursiva: o lista este fie vida, fie un element urmat de o lista element: tipul de date stocat in lista (informatia utila) pozitie: tip care identifica locatia unui element in lista (posibil: pointer) • init(lista)  * creaza lista vida *  • empty(lista): boolean  * lista este vida ? *  • first(lista) : pozitie  * returneaza prima pozitie din lista *  • next(pozitie) : pozitie  * urmatoarea pozitie; lista e implicita *  • lookup(lista, element) : pozitie  * cauta elementul in lista *  • insertfirst(lista, element)  * insereaza la inceput *  • insertafter(pozitie, element)  * insereaza dupa pozitie *  • delete(lista, pozitie)  * sterge pozitia din lista *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 4 : o structura de date pentru un sir de elemente identice - ofera nu numai secventiere, dar si acces direct - implementeaza o functie de la multimea indicilor la cea de elemente => fiecare element poate fi identificat printr-un intreg (pozitia in tablou) O lista s-ar putea implementa cu un tablou Dar apar si dezavantaje: - daca lista creste, tabloul poate fi prea mic (=> realocat dinamic) - stergerea unui element din tablou implica -fie mutarea celorlalte elemente (schimba corespondenta intre indici si element; e costisitoare daca trebuie pastrata ordinea) - fie marcarea elementelor sterse (cu un camp fanion suplimentar) (parcurgerea devine ineficienta daca sunt multe elemente sterse) : tip similar cu lista, dar fara ordonare si fara duplicate => implementabila ca lista cu insertfirst, delete si test de membru Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 5 typedef int elem t;  * sau alt tip *  typedef struct n { elem t e;  * informatia utila *  struct n *next;  * pointer la elementul urmator *  } node t;  * nume echivalent cu struct n *  typedef node t *list t;  * tipurile lista si pozitie sunt acelasi, adica un pointer la node t *  2 5 3 NULL Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 6 void init(list t *pl) { *pl = NULL; }  * modifica valoarea listei, deci trebuie pointer la lista ca parametru *  int empty(list t 1) { return 1 != NULL; } node t *first(list t 1) { return 1; } node t *next(node t *n) { return n->next; } int insertfirst(list t *pl, elem t e) { node t *p; if (!(p = malloc(sizeof(struct n)))) return 0;  * eroare *  p->e = e; p->next = *pl; *pl = p; return 1;  * succes *  }  * modifica capul listei, deci are parametru pointer la lista *  int insertafter(node t *n, elem t e) { node t *p; if (!(p = malloc(sizeof(node t)))) return 0;  * eroare *  p->e = e; p->next = n->next; n->next = p; return 1;  * succes *  }  * noul nod p e inserat dupa vechiul nod n *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 7 node t *lookup(list t 1, elem t e) for ( ; 1 != NULL; 1 = l->next)  * cauta pana la sfarsit *  if (e == l->e) return 1;  * s-a gasit, returneaza pozitia *  return NULL;  * returneaza NULL daca nu s-a gasit *  node t *lookup(list t 1, elem t e) if (!1 || e == l->e) return 1;  * gasit sau sfarsit *  return lookup(l->next, e);  * cauta incepand cu urmatorul *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 8  * parametrul 2 e nodul de sters, eventual gasit cu lookup *  int delete(list t *pl, node t *n) { * poate schimba capul listei *  node t p = *pl;  * deci ia parametru pointer la lista *  if (!p || !n) return 0;  * lista sau nod vid, eroare *  if (p != n) {  * n nu e primul nod din lista *  while (p && p->next != n) p = p->next;  * cauta predecesorul *  if (!p) return 0;  * nu s-a gasit n, eroare *  p->next = n->next;  * ’sare’ peste nodul n *  } else *pl = n->next;  * n e primul, schimba capul listei *  free(n); return 1;  * elibereaza n, returneaza succes *  int delete (list t *pl, node t *n)  * presupune n nenul *  if (!*pl) return 0;  * lista vida, deci nu s-a gasit *  if (*pl == n) { *pl = n->next; free(n); return 1; }  * sterge *  else return delete (&(*pl)->next, n);  * incearca mai departe *  }  * apelata din nou cu *adresa* pointerului la urmatorul nod *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 9 list t reverselist(list t head) {  * varianta iterativa *  node t *nxt, *rev = NULL;  * nxt=urm nod, rev=lista inversata *  while (head) {  * leaga urmatorul elem la rev *  nxt = head->next; head->next = rev; rev = head; head = nxt;  * avanseaza nxt in lista *  } return rev; list t reverselist(list t rev, list t rest) if (!rest) return rev; else { node t *nxt = rest->next; rest->next = rev; return reverselist(rest, nxt); }  * la inceput apelam reverselist(NULL, head); *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 10 Graf: o colectie de noduri si muchii care leaga doua noduri Exemplu: o multime de localitati cu drumurile intre ele Un nod poate fi legat cu oricat de multe alte noduri => folosim lista typedef struct n { int id;  * un numar pt identificare *   * alte informatii despre nod *  struct e *edges;  * lista de muchii *  } node t; typedef struct e { struct n *dest;  * celalalt capat al muchiei *  struct e *next;  * pointer spre urmatoarea muchie *  } edge t; Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 11 in listele prezentate pana acum: - parcurgerea se poate face intr-un singur sens (nu si inapoi) - stergerea necesita parcurgerea listei (chiar dat fiind nodul de sters) pentru a gasi nodul precedent care contine legatura spre nodul dat Solutie: se retin legaturi (pointeri) spre vecinii in ambele sensuri: typedef struct n { elem t info;  * informatia utila din nod *  struct n *prev, *next;  * pointeri spre cei vecini *  } node t; Capul listei are predecesor nul, coada listei are succesor nul Varianta: lista circulara, legand capul si coada listei Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea Tipuri de date abstracte Liste 12 int insertafter(node t *p, elem t e) { node t *n = malloc(sizeof(node t)); if (!n) return 0;  * eroare, memorie insuficienta *  n->info = e; n->prev = p; n->next = p->next;  * leaga nodul n la vecini *  p->next->prev = n; p->next = n;  * leaga vecinii la nodul n *  return 1;  * succes *  void delete(list t *pl, node t *p) {  * trebuie modificate cel mult doua legaturi *  if (p->next) p->next->prev=p->prev;  * daca p nu e ultimul *  if (p->prev) p->prev->next=p->next;  * daca p nu e primul *  else *pl = p->next;  * p era primul, schimba capul listei *  free(p);  * elibereaza memoria pentru nodul sters *  Utilizarea si programarea calculatoarelor 2 Curs 3 Marius Minea 29 martie 2005 Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Sortare 2 Sortarea = aranjarea unei liste de obiecte dupa o relatie de ordine data (ex : ng Q( (n)) daca timpul de rulare este > c (n), pt n > ng se studiaza pentru cazul cel mai defavorabil si cel mediu - vom rationa despre corectitudinea algoritmilor folosind invarianti Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Sortare 3 sa generam un tablou de numere aleatoare pe care sa le sortam, int rand(void);  * in stdlib h *  genereaza numar pseudoaleator intre o si rand max void srand(unsigned seed); seteaza starea initiala pentru generatorul de numere pseudoaleatoare OBS: in absenta apelului la srand, functia rand va repeta aceeasi secventa generata pentru fiecare rulare - se poate initializa generatorul in functie de ceas (time h): time t tiine(time t *timer); (time t e unsigned long) ret nr de secunde trecute de la o data origine (UNiX: 1 ian 1970) daca param, pointer e nenul, valoarea se stocheaza si la acea adresa const int N=100; const int MAX=1000; int i, a[N] ; srand((int)time(NULL));  * initializeaza generatorul *  for (i = 0; i = i; j—)  * if (a[j] preferabil daca dimensiunea elementelor este mare invariant: acelasi ca la bubblesort: dupa iteratia i (1 = i && a[r] >= p) r—; if (1 i) quicksort (a, i, r); if (1 n p S2(f(n)) daca timpul de rulare este > c (n), pt n > ng se studiaza pentru cazul cel mai defavorabil si cel mediu - vom rationa despre corectitudinea algoritmilor folosind invarianti Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Sortare 3 Sortare sa generam un tablou de numere aleatoare pe care sa le sortam, int rand(void);  * in stdlib h *  genereaza numar pseudoaleator intre o si rand max void srand(unsigned seed); seteaza starea initiala pentru generatorul de numere pseudoaleatoare OBS: in absenta apelului la srand, functia rand va repeta aceeasi secventa generata pentru fiecare rulare - se poate initializa generatorul in functie de ceas (time h): time t time(time t *timer); (time t Ѳ unsigned long) ret nr de secunde trecute de la o data origine (UNiX: 1 ian 1970) daca param, pointer e nenul, valoarea se stocheaza si la acea adresa const int N=100; const int MaX=1000; int i, a[N] ; srand((int)time(NULL));  * initializeaza generatorul *  for (i = 0; i = i; j—)  * if (a[j] preferabil daca dimensiunea elementelor este mare invariant: acelasi ca la bubblesort: dupa iteratia i (1 = i && aCr] >= p) r—; if (1 &aCr]); else break; *pl = 1; *pr = r; void quicksort (int *a, int i, int j) { int 1, r; partition(a, i, j, &1, &r); if (r > i) quicksort (a, i, r); if (1 ierarhizare Definitie recursiva: - un arbore e fie un singur nod n (care reprezinta si radacina arborelui) - sau un nod n, impreuna cu arborii Tj, , Tj ai caror radacini nj, , тц 11 au pe n ca parinte Nodurile n-i sunt lui n, iar arborii 7) sunt lui n Un nod fara fii se mai numeste nod Uneori se include in definitie si , fara nici un nod Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori Arbori 4 Operatii pe tipul abstract arbore init(arbore)  * initializeaza arborele ca fiind vid (NULL) *  parinte(arbore, nod): nod  * parintele nodului in arbore sau NULL *  fiu stang(arbore, nod): nod  * returneaza primul fiu sau NULL *  frate drept(arbore, nod): nod  * return urmatorul frate sau NULL *  radacina(arbore): nod  * returneaza radacina arborelui sau NULL *  creeaza(nod, arbore l arbore k): arbore  * creeaza un arbore cu radacina si subarborii specificati *  insereaza(arbore, nodparinte, nodnou)  * insereaza la parinte *  sterge(arbore, nod)  * sterge un nod dintr-un arbore *  Practic, cel mai des ne referim la arbore prin nodul sau radacina => arbore si nod vor fi acelasi tip in general, se ordinea in care sunt dati fii unui nod are importanta, (enumerate) in diverse moduri: Nodurile unui arbore pot fi • Traversarea in - se viziteaza intai radacina - apoi se traverseaza pe rand in • Traversarea in - se traverseaza pe rand in - apoi se viziteaza radacina • Traversarea in - se traverseaza intai in - se viziteaza radacina - se traverseaza pe rand in Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Obs Definitiile de mai sus sunt Cazul de baza: pentru traversarea arborelui Utilizarea si programarea calculatoarelor Curs 16 toti subarborii toti subarborii primul subarbore (stang) toti ceilalti subarborii nu se face nimic Marius Minea Arbori 5 Arbori typedef ??? node t;  * vom discuta posibile structuri de date *  #define EMPTY ???  * o valoare pentru arborele vid *  void preorder(tree t n) {   * arborele e dat prin radacina *  tree t c; if (n == EMPTY) return; visit(n);  * contine ce trebuie facut pt fiecare nod *  for (c = fiu stang(n); c != EMPTY; c = frate drept(n, c)) preorder(c); void postorder(tree t n) { tree t c; if (n == EMPTY) return; for (c = fiu stang(n); c != EMPTY; c = frate drept(n, c)) postorder(c); visit(n); Utilizarea si programarea calculatoarelor Curs 16 Marius Minea void inorder(tree t n) { tree t o; if (n == EMPTY) return; if ((o = fiu stang(n)) != EMPTY) inorder(c); visit(n); for (; o != EMPTY; o = frate drept(n, o)) inorder(c); } Observatii: - procedurile de traversare sunt scrise independent de reprezentarea arborelui; folosesc doar operatiile (functiile) fiu stang si frate drept si valoarea empty => s-a definit intr-adevar un tip de date abstract - preordine: daca trebuie transmisa informatie din parinte la fii - postordine: daca trebuie transmisa informatie de la fii la parinte (ex evaluarea unei expresii; numararea nodurilor; adancimea arborelui) - inordine: ex pentru sortarea cu arbori binari ordonati Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori Arbori Arborii pot fi reprezenta#, ca si listele, static, cu tablouri, folosind indici pentru referirea la nodurile fiu Cea mai frecventa reprezentare este insa dinamica, cu pointeri typedef struct n {  * * aici se pune informatia utila din nod *  struct n *fiu stang, *frate drept; }• node t;  * node t e un tip structura sinonim cu struct n *  typedef node t *tree t;  * un arbore e un pointer la nod *  O alta varianta ar fi sa folosim o lista separata pentru fii: typedef struct n { typedef struct 1 {  * informatia utila *  struct n *nod;  * pointer la nod *  struct 1 *fii; struct 1 *next;  * urm in lista *  }• node t; } list t; Uneori, se adauga si un pointer (redundant) parinte, daca pt problema data e necesar accesul rapid si eficient la parintele unui nod dat Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Caz particular in care orice nod are doi fii: fiul stang si cel drept -in general: oricare (sau amandoi) pot lipsi (= arbore vid) - uneori: arbore binar propriu-zis: fiecare nod are 0 sau 2 fii typedef struct n {  * informatia utila din nod *  struct n *left, *right; } node t; Exemple: - reprezentarea unei expresii: nodurile intermediare contin operatori, nodurile frunza contin valori; calcul prin parcurgere in postordine (operatorii unari vor avea subarborele drept vid) - arbori de decizie binari, pentru reprezentarea functiilor boolene noduri intermediare: etichetate cu variabile; nodurile terminale: 0 si 1 arborele stang: valoarea functiei cand variabila respectiva e 0 arborele drept: valoarea functiei cand variabila respectiva e 1 Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori Arbori 10 Fiecare nod are o cheie (valoare) a unui tip ordonat (intreg, real, sir) Pentru fiecare nod c din subarborele n ieft avem c key = n key Folositi pentru a pastra o multime de elemente, ordonata dupa chei, intr-o structura flexibila (nu tablou fix), cu cautare modificare rapida typedef int key t;  * sau alt tip ordonat *  typedef struct n { key t key;  * sau un alt tip ordonat *  struct n *left, *right; } node t; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea node t *search(node t *n, key t key) { if (!n) return NULL;  * arbore nul, cheia nu s-a gasit *  else if (key == n->key) return n;  * gasit, returneaza nodul *  else if (key key) return search(n->left, key); else return search(n->right, key);  * cauta intr-un subarbore *  void insert(node t **n, key t key) {  * poate modifica *n *  while (*n)  * cauta un loc gol potrivit *   * varianta care accepta duplicate, inserate la stanga *   * fara duplicate: se iese la test de egalitate *  if (key key) n = &(*n)->left;  * cauta la stanga *  else n = &(*n)->right;  * cauta la dreapta *  if (!(*n = malloc(sizeof(node t)))) return;  * aloca nodul *  (*n)->left = (*n)->right = NULL;  * noul nod e terminal *  (*n)->key = key; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori Arbori 12 void delete (node t **n, key t key) {  * poate modifica *n *  while (*n) { if (key == (*n)->key) {  * sterge, caz simplu: 1 fiu *  node t *p = *n; if (!(*n)->left) *n = (*n)->right; else if (!(*n)->right) *n = (*n)->left; else {  * 2 fii coboara spre dreapta in cel stang *  do n = &(*n)->right while (*n); *n = p->right;  * insereaza subarborele drept la *n *  free(p); return;  * elibereaza memoria pentru nodul sters *  }• else if (key key) n = &(*n)->left; else n = &(*n)->right; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea - se creeaza un arbore binar ordonat (vid) - se insereaza pe rand elementele de sortat - se parcurge arborele in inordine => se obtin elementele in ordine Complexitate: - toate operatiile (cautare, insertie, sortare) au complexitate liniara in adancimea h a arborelui -in cazul ideal (si mediu), h  og i (nr de noduri) -in cazul defavorabil: h = n (deja sortat => arborele devine lista) - sortarea e О(иіоди) in medie, dar poate fi O("2) Solutie: diverse tipuri de arbori binari echilibrati Utilizarea si programarea calculatoarelor Curs 16 Marius Minea 5 aprilie 2004 Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 2 ne permit sa structuram ierarhic o multime de elemente - structura de directoare si fisiere intr-un calculator - arborele genealogic (o persoana, parintii, bunicii, strabunicii, etc ) -structura ierarhica pt organizatie (director, sefi departamente, etc ) - circuite logice, expresii aritmetice logice, baze de date strucurate Un arbore e format din , din care unul e Fiecare nod in afara de radacina are un nod => ierarhizare Definitie recursiva: - un arbore e fie un singur nod n (care reprezinta si radacina arborelui) - sau un nod n, impreuna cu arborii ТІ5 , Tk ai caror radacini ni, , il au pe n ca parinte Nodurile щ sunt lui n, iar arborii T- sunt lui n Un nod fara fii se mai numeste nod Uneori se include in definitie si , fara nici un nod Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 3 Operatii pe tipul abstract arbore • init(arbore)  * initializeaza arborele ca fiind vid (NULL) *  • parinte(arbore, nod): nod  * parintele nodului in arbore sau NULL *  • fiu stang(arbore, nod): nod  * returneaza primul fiu sau NULL *  • frate drept(arbore, nod): nod  * return urmatorul frate sau NULL *  • radacina(arbore): nod  * returneaza radacina arborelui sau NULL *  • creeaza(nod, arbore l, arbore k): arbore  * creeaza un arbore cu radacina si subarborii specificati *  • insereaza(arbore, nodparinte, nodnou)  * insereaza la parinte *  • sterge(arbore, nod)  * sterge un nod dintr-un arbore *  Practic, cel mai des ne referim la arbore prin nodul sau radacina => arbore si nod vor fi acelasi tip Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 4 in general, se ordinea in care sunt dati fii unui nod are importanta Nodurile unui arbore pot fi • Traversarea in - se viziteaza intai radacina - apoi se traverseaza pe rand in • Traversarea in - se traverseaza pe rand in - apoi se viziteaza radacina • Traversarea in - se traverseaza intai in - se viziteaza radacina - se traverseaza pe rand in Obs Definitiile de mai sus sunt Cazul de baza: pentru traversarea arborelui Utilizarea si programarea calculatoarelor Curs 16 (enumerate) in diverse moduri: toti subarborii toti subarborii primul subarbore (stang) toti ceilalti subarborii nu se face nimic Marius Minea Arbori 5 typedef ??? node t;  * vom discuta posibile structuri de date *  #define EMPTY ???  * o valoare pentru arborele vid *  void preorder(tree t n) {   * arborele e dat prin radacina *  tree t c; if (n == EMPTY) return; visit(n);  * contine ce trebuie facut pt fiecare nod *  for (c = fiu stang(n); c != EMPTY; c = frate drept(n, c)) preorder(c); void postorder(tree t n) { tree t c; if (n == EMPTY) return; for (c = fiu stang(n); c != EMPTY; c = frate drept(n, c)) postorder(c); visit(n); Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori б void inorder(tree t n) { tree t c; if (n == EMPTY) return; if ((c = fiu stang(n)) != EMPTY) inorder(c); visit(n); for (; c != EMPTY; c = frate drept(n, c)) inorder(c); Observatii: - procedurile de traversare sunt scrise independent de reprezentarea arborelui; folosesc doar operatiile (functiile) fiu stang si frate drept si valoarea empty => s-a definit intr-adevar un tip de date abstract - preordine: daca trebuie transmisa informatie din parinte la fii - postordine: daca trebuie transmisa informatie de la fii la parinte (ex evaluarea unei expresii; numararea nodurilor; adancimea arborelui) - inordine: ex pentru sortarea cu arbori binari ordonati Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 7 Arborii pot fi reprezentatii, ca si listele, static, cu tablouri, folosind indici pentru referirea la nodurile fiu Cea mai frecventa reprezentare este insa dinamica, cu pointeri typedef struct n {  * aici se pune informatia utila din struct n *fiu stang, *frate drept; } node t;  * node t e un tip structura typedef node t *tree t;  * un arbore nod *  sinonim cu struct n *  e un pointer la nod *  O alta varianta ar fi sa folosim o   sta separata pentru fii: typedef struct n {  * informatia utila *  struct 1 *fii; } node t; typedef struct 1 { struct n *nod;  * pointer la nod *  struct 1 *next;  * urm in lista *  } list t; Uneori, se adauga si un pointer (redundant) parinte, daca pt problema data e necesar accesul rapid si eficient la parintele unui nod dat Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 8 Caz particular in care orice nod are doi fii: fiul stang si cel drept -in general: oricare (sau amandoi) pot lipsi (= arbore vid) - uneori: arbore binar propriu-zis: fiecare nod are 0 sau 2 fii typedef struct n {  * informatia utila din nod *  struct n *left, *right; } node t; Exemple: - reprezentarea unei expresii: nodurile intermediare contin operatori, nodurile frunza contin valori; calcul prin parcurgere in postordine (operatorii unari vor avea subarborele drept vid) - arbori de decizie binari, pentru reprezentarea functiilor boolene noduri intermediare: etichetate cu variabile; nodurile terminale: 0 si 1 arborele stang: valoarea functiei cand variabila respectiva e 0 arborele drept: valoarea functiei cand variabila respectiva e 1 Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 9 Fiecare nod are o cheie (valoare) a unui tip ordonat (intreg, real, sir) Pentru fiecare nod c din subarborele n ieft avem c key = n key Folositi pentru a pastra o multime de elemente, ordonata dupa chei, intr-o structura flexibila (nu tablou fix), cu cautare modificare rapida typedef int key t;  * sau alt tip ordonat *  typedef struct n { key t key;  * sau un alt tip ordonat *  struct n *left, *right; } node t; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 10 node t *search(node t *n, key t key) { if (!n) return NULL;  * arbore nul, cheia nu s-a gasit *  else if (key == n->key) return n;  * gasit, returneaza nodul *  else if (key key) return search(n->left, key); else return search(n->right, key);  * cauta intr-un subarbore *  void insert(node t **n, key t key) {  * poate modifica *n *  while (*n)  * cauta un loc gol potrivit *   * varianta care accepta duplicate, inserate la stanga *   * fara duplicate: se iese la test de egalitate *  if (key key) n = &(*n)->left;  * cauta la stanga *  else n = &(*n)->right;  * cauta la dreapta *  if (!(*n = malloc(sizeof(node t)))) return;  * aloca nodul *  (*n)->left = (*n)->right = NULL;  * noul nod e terminal *  (*n)->key = key; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 11 void delete (node t **n, key t key) {  * poate modifica *n *  while (*n) { if (key == (*n)->key) {  * sterge, caz simplu: 1 fiu *  node t *p = *n; if (!(*n)->left) *n = (*n)->right; else if (!(*n)->right) *n = (*n)->left; else {  * 2 fii coboara spre dreapta in cel stang *  do n = &(*n)->right while (*n) ; *n = p->right;  * insereaza subarborele drept la *n *  free(p); return;  * elibereaza memoria pentru nodul sters *  } else if (key key) n = &(*n)->left; else n = &(*n)->right; Utilizarea si programarea calculatoarelor Curs 16 Marius Minea Arbori 12 - se creeaza un arbore binar ordonat (vid) - se insereaza pe rand elementele de sortat - se parcurge arborele in inordine => se obtin elementele in ordine Complexitate: - toate operatiile (cautare, insertie, sortare) au complexitate liniara in adancimea h a arborelui -in cazul ideal (si mediu), h  ogn (nr de noduri) -in cazul defavorabil: h = n (deja sortat => arborele devine lista) - sortarea e O(n log n) in medie, dar poate fi O(n2) Solutie: diverse tipuri de arbori binari echilibrati Utilizarea si programarea calculatoarelor Curs 16 Marius Minea 12 aprilie 2005 Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 2 Lista = o insiruire de elemente - pe care o putem parcurge secvential (de la inceput la sfarsit) - putem introduce si sterge elemente in dintr-o anumita pozitie Fie lista de intregi 5, 2, 3, 6 O putem implementa cu un tablou: int a ={5,2, 3, 6}; Dar: - are 4 elemente, fara loc pentru altele (ex sa inseram pe 7 dupa 2) (putem declara tabloul mai mare: int a ; dar tot se va umple) - pentru a sterge a[l]=2 din lista trebuie mutate elementele de dupa => o implementare simpla cu tablou nu e eficienta si flexibila => trebuie sa reprezentam inlantuirea, care e la baza notiunii de lista (cum ajungem de la un element la urmatorul ? in tablou: indice++) - folosim pointeri si pastram in fiecare element adresa urmatorului Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste typedef int elem t; typedef struct n { elem t e; struct n *next; } node t;  * tipul  * camp c  * camp c  * node t typedef node t *list t;  * lista 2 5 Lista vida e reprezentata ca point( list-t 1 = NULL; Utilizarea si programarea calculatoarelor 2 Curs 3 elementelor din lista *  u informatia utila — intregul *  u adresa elementului urmator *  e nume echivalent cu struct n *  e adresa unui element (primului) *  3 NULL 5Г null => initializam lista: 17 Marius Minea Liste 4 node t *insertafter(node t *n, elem t e) {    presupune ca n e un nod valid, diferit de NULL node t *p;    adresa pentru noul nod, trebuie alocat! if (!(p = malloc(sizeof(node t)))) return NULL;    eroare p->e = e;    completam elementul in noul nod p->next = n->next;    legam noul nod la succesorul celui vechi n->next = p;    legam vechiul nod la noul nod creat return p;    returnam noul nod creat }    noul nod p a fost inserat dupa vechiul nod n Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 5 node t *insertfirst(node t *n, elem t e) { node t *p;  * adresa pentru noul nod, trebuie alocat! *  if (!(p = malloc(sizeof(node t)))) return 0;  * eroare *  p->e = e;  * completam elementul in noul nod *  p->next = n;  * legam noul nod la succesorul celui vechi *  return p;  * noul nod, devenit capul listei *  Obs: nu putem insera la fel inaintea unui element arbitrar n din lista: nu stim care e elementul anterior care are legatura spre n (legatura care trebuie modificata sa arate spre noul nod p) - e nevoie sa stim elementul dinaintea lui n - la fel si pentru stergere Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste б void deleteafter(node t *n)    sterge nodul de dupa n    presupune ca n e nod valid, diferit de NULL node t *p = n->next;    nodul care trebuie sters if (p == NULL) return;    nu e nimic de sters n->next = p->next;    scoate pe p din lista (inainte n->next==p)    si leaga n la succesorul nodului p de sters free(p);    elibereaza memoria pentru nodul sters Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 7 node t *lookup(list t 1, elem t e) for ( ; 1 != NULL; 1 = l->next)    cauta pana la sfarsit if (e == l->e) return 1;    gasit: returneaza nodul (pozitia) return NULL;    nu s-a gasit: returneaza NULL Putem implementa usor cautarea, privind lista ca obiect recursiv: Def : o lista este: fie lista vida, fie un element urmat de o lista node t *lookup(list t 1, elem t e) if (1 == NULL) return NULL;    negasit: lista vida else if (e == l->e) return 1;    gasit la pozitia curenta else return lookup(l->next, e);    cauta mai departe Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 8 Cu functiile de mai sus putem sa scriem programe cu liste Dar e util sa nu trebuiasca sa rescriem functiile de fiecare data => putem crea o biblioteca de lucru cu liste Declaratiile de tipuri si functii necesare le punem intr-un fisier lista h: typedef int elem t;    tipul elementelor din lista typedef struct n node t;    declaratie incompleta    ne spune ca node t e un tip structura, fara a preciza continutul typedef node t *list t;    tipul lista e adresa unui nod Declaratia completa a tipului structura si definitiile le punem intr-un fisier lista c care poate fi compilat separat, si linkeditat apoi cu programul care utilizeaza functiile Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 9 list t reverselist(list t head) {  * varianta iterativa *  node t *nxt, *rev = NULL;  * nxt=urm nod, rev=lista inversata *  while (head) {  * leaga urmatorul elem la rev *  nxt = head->next; head->next = rev; rev = head; head = nxt;  * avanseaza nxt in lista *  } return rev; list t reverselist(list t rev, list t rest) if (!rest) return rev; else { node t *nxt = rest->next; rest->next = rev; return reverselist(rest, nxt); }  * la inceput apelam reverselist(NULL, head); *  Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste Lista = ©insiruire de elemente - pe care o putem parcurge secvential (de la inceput la sfarsit) - putem introduce si sterge elemente in dintr-o anumita pozitie 12 aprilie 2005 Fie lista de intregi 5, 2, 3, 6 O putem implementa cu un tablou: int a = { 5, 2, 3, 6}; Dar: - are 4 elemente, fara loc pentru altele (ex sa inseram pe 7 dupa 2) (putem declara tabloul mai mare: int a[101; dar tot se va umple) - pentru a sterge a[l]=2 din lista trebuie mutate elementele de dupa => o implementare simpla cu tablou nu e eficienta si flexibila => trebuie sa reprezentam inlantuirea, care e la baza notiunii de lista (cum ajungem de la un element la urmatorul ? in tablou: indice++) - folosim pointeri si pastram in fiecare element adresa urmatorului Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 3 Liste typedef int elem t; typedef struct n { elem t e; struct n *next; } node t; typedef node t *list t;  * tipul elementelor din lista *   * camp cu informatia utila — intregul *   * camp cu adresa elementului urmator *   * node t e nume echivalent cu struct n *   * lista e adresa unui element (primului) *  2 3 NULL Lista vida e reprezentata ca pointer null => initializam lista: list t 1 = NULL; node t *insertafter(node t *n, elem t e) {    presupune ca n e un nod valid, diferit de NULL node t *p;    adresa pentru noul nod, trebuie alocat! if (!(p = malloc(sizeof(node t)))) return NULL;    eroare p->e = e;    completam elementul in noul nod p->next = n->next;    legam noul nod la succesorul celui vechi n->next = p;    legam vechiul nod la noul nod creat return p;    returnam noul nod creat }    noul nod p a fost inserat dupa vechiul nod n Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste 5 Liste node t *insertfirst(node t *n, elem t e) { node t *p;  * adresa pentru noul nod, trebuie alocat! *  if (!(p = malloc(sizeof(node t)))) return 0;  * eroare *  p->e = e;  * completam elementul in noul nod *  p->next = n;  * legam noul nod la succesorul celui vechi *  return p;  * noul nod, devenit capul listei *  Obs: nu putem insera la fel inaintea unui element arbitrar n din lista: nu stim care e elementul anterior care are legatura spre n (legatura care trebuie modificata sa arate spre noul nod p) - e nevoie sa stim elementul dinaintea lui n - la fel si pentru stergere Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea void deleteafter(node t *n)    sterge nodul de dupa n    presupune ca n e nod valid, diferit de NULL node t *p = n->next;    nodul care trebuie sters if (p == NULL) return;    nu e nimic de sters n->next = p->next;    scoate pe p din lista (inainte n->next==p)   si leaga n la succesorul nodului p de sters free(p);    elibereaza memoria pentru nodul sters Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Liste Liste node t *lookup(list t 1, elem t e) for ( ; 1 != NULL; 1 = l->next)    cauta pana la sfarsit if (e == l->e) return 1;    gasit: returneaza nodul (pozitia) return NULL;    nu s-a gasit: returneaza NULL Putem implementa usor cautarea, privind lista ca obiect recursiv: Def : o lista este: fie lista vida, fie un element urmat de o lista node t *lookup(list t 1, elem t e) if (1 == NULL) return NULL;    negasit: lista vida else if (e == l->e) return 1;    gasit la pozitia curenta else return lookup(l->next, e);    cauta mai departe Utilizarea si programarea calculatoarelor 2 Curs 17 Marius Minea Cu functiile de mai sus putem sa scriem programe cu liste Dar e util sa nu trebuiasca sa rescriem functiile de fiecare data => putem crea o biblioteca de lucru cu liste Declaratiile de tipuri si functii necesare le punem intr-un fisier lista h: typedef int elem t;    tipul elementelor din lista typedef struct n node t;    declaratie incompleta    ne spune ca node t e un tip structura, fara a preciza continutul typedef node t *list t;    tipul lista e adresa unui nod Declaratia completa a tipului structura si definitiile le punem intr-un fisier lista c care poate fi compilat separat, si linkeditat apoi cu programul care utilizeaza functiile Utilizarea si programarea calculatoarelor 2, Curs 17 Marius Minea Liste list t reverselist(list t head) {  * varianta iterativa *  node t *nxt, *rev = NULL;  * nxt=urm, nod, rev=lista inversata *  while (head) {  * leaga urmatorul elem la rev *  nxt = head->next; head->next = rev; rev = head; head = nxt;  * avanseaza nxt in lista *  return rev; list t reverselist(list t rev, list t rest) if (!rest) return rev; else { node t *nxt = rest->next; rest->next = rev; return reverselist(rest, nxt); } }  * la inceput apelam reverselist(NULL, head); *  Utilizarea si programarea calculatoarelor 2, Curs 17 Marius Minea Cautare cu revenire 27 aprilie 2004 Pentru a rezolva o problema: - trebuie sa alegem ideea potrivita de solutionare (algoritmul) - si structurile de date corespunzatoare pe care opereaza acesta Cateva tehnici generale: - Cautarea cu revenire (backtracking) cand solutia e gasita doar prin incercarea tuturor solutiile posibile - Metoda greedy cand solutia se obtine facand la fiecare pas mutarea "cea mai buna" - Descompunerea in subprobleme (divide and conquer) prin impartirea problemei in mai multe probleme similare, mai mici - Programare dinamica tot prin subprobleme mai mici, dar de regula cu portiuni comune Obs Recursivitatea e un element la nivel mai fundamental fata de metodele de mai sus (toate acestea se pot implementa recursiv) Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire 3 Cautare cu revenire Avem nevoie de: - o procedura recursiva de cautare - o structura de date (globala) in care memoram solutia procedura cauta(s: solutie) daca solutia e buna, tipareste; return pentru fiecare continuare posibila adauga continuarea la solutie cauta(solutie completata) sterge continuarea din solutie Obs: Aceasta varianta ne afiseaza toate solutiile Pentru a nu afisa decat una, modificam procedura sa returneze daca o solutie s-a gasit sau nu, si intrerupem ciclul de cautare daca da Sa se coloreze un graf cu un numar dat de culori, cu noduri adiacente colorate diferit procedura cauta() daca toate nodurile colorate, afiseaza; return; alege un nod necolorat n pentru fiecare culoare c coloreaza n cu c daca nu exista vecin colorat cu c, cauta() decoloreaza n Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire Cautare cu revenire Afisati, in ordine crescatoare, toate permutarile numerelor de la 1 la n Solutie: Pentru fiecare pozitie, de la 1 la n, alegem pe rand fiecare din numerele nealese inca, si continuam cu pozitia urmatoare procedura alege(p: pozitie) daca p > n tipareste permutarea pentru i de la 1 la n daca i e liber pune i pe pozitia p marcheaza i ca ales alege(p + 1) marcheaza i ca liber intr-un graf (ordonat), exista un drum de la un nod s la un nod   ? Solutie: Parcurgem, cu revenire, graful incepand de la s, pana gasim   sau epuizam drumurile Ne oprim din drum cand inchidem un ciclu, drum: sir de noduri procedura cauta(n: nod) daca n e f, tipareste drumul curent; stop pentru fiecare succesor и al lui n daca и nu e pe drumul curent adauga и la drumul curent cauta(u) scoate и din drumul curent program principal initializeaza drumul cu s cauta(s) Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Utilizarea si programarea calculatoarelor Curs 17 Marius Minea 27 aprilie 2004 Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire 2 Pentru a rezolva o problema: - trebuie sa alegem ideea potrivita de solutionare (algoritmul) - si structurile de date corespunzatoare pe care opereaza acesta Cateva tehnici generale: - Cautarea cu revenire (backtracking) cand solutia e gasita doar prin incercarea tuturor solutiile posibile - Metoda greedy cand solutia se obtine facand la fiecare pas mutarea "cea mai buna" - Descompunerea in subprobleme (divide and conquer) prin impartirea problemei in mai multe probleme similare, mai mici - Programare dinamica tot prin subprobleme mai mici, dar de regula cu portiuni comune Obs Recursivitatea e un element la nivel mai fundamental fata de metodele de mai sus (toate acestea se pot implementa recursiv) Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire 3 Avem nevoie de: - o procedura recursiva de cautare - o structura de date (globala) in care memoram solutia procedura cauta(s: solutie) daca solutia e buna, tipareste; return pentru fiecare continuare posibila adauga continuarea la solutie cauta(solutie completata) sterge continuarea din solutie Obs: Aceasta varianta ne afiseaza toate solutiile Pentru a nu afisa decat una, modificam procedura sa returneze daca o solutie s-a gasit sau nu, si intrerupem ciclul de cautare daca da Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire 4 Sa se coloreze un graf cu un numar dat de culori, cu noduri adiacente colorate diferit procedura cauta() daca toate nodurile colorate, afiseaza; return; alege un nod necolorat n pentru fiecare culoare c coloreaza n cu c daca nu exista vecin colorat cu c, cauta() decoloreaza n Utilizarea si programarea calculatoarelor Curs 17 Marius Minea Cautare cu revenire Afisati, in ordine crescatoare, toat n tipareste permutarea pentru i de la 1 la n daca i e liber pune i pe pozitia p marcheaza i ca ales alege(p + 1) marcheaza i ca liber Utilizarea si programarea calculatoarelor Curs 17 5 э permutarile numerelor de la 1 la n la 1 la n, alegem pe rand fiecare din m cu pozitia urmatoare Marius Minea Cautare cu revenire 6 intr-un graf (ordonat), exista un drum de la un nod s la un nod f 7 Solutie: Parcurgem, cu revenire, graful incepand de la s, pana gasim f sau epuizam drumurile Ne oprim din drum cand inchidem un ciclu drum: sir de noduri procedura cauta(n: nod) daca n e f, tipareste drumul curent; stop pentru fiecare succesor и al lui n daca и nu e pe drumul curent adauga и la drumul curent cauta(tt) scoate и din drumul curent program principal initializeaza drumul cu s cauta(s) Utilizarea si programarea calculatoarelor Curs 17 Marius Minea 11 mai Utilizarea si programarea calculatoarelor Curs 18 2004 Marius Minea Cautare in grafuri Compilare separata 2 in gasirea unei cai din cursul anterior, se continua imediat cautarea din ultimul nod atins, adica parcurgerea cat mai 'in adancime" a grafului Exista insa si alte strategii de parcurgere O procedura generala de parcurgere a tuturor nodurilor: procedura parcurge n = nod initial marcheaza n ca atins introdu n in lista de asteptare cat timp lista de asteptare nu e vida scoate un nod n din lista viziteaza n  * fa prelucrarea necesara *  pentru fiecare succesor s al lui n daca s n-a fost atins marcheaza s ca atins introdu s in lista de asteptare Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 3 unui nod e necesara pentru evitarea - e necesar un camp corespunzator in structura de date a nodului (sau un tablou auxiliar de fanioane, indexat dupa noduri) - pe parcursul algoritmului, un nod se afla succesiv in - nemarcat: nu a fost inca decoperit (atins) de algoritm - marcat, in lista de asteptare: descoperit dar neprelucrat - marcat, iesit din lista de asteptare: prelucrare incheiata Campul "marcat" poate avea valori in functie de scopul parcurgerii (ex culoarea nodului, predecesorul in parcurgere, etc ) intr-un graf neconectat, se repeta pentru fiecare componenta conexa Obs: pentru mai multe prelucrari repetate ale grafului, el trebuie din nou "demarcat" (acelasi algoritm, cu inversarea sensului marcarii) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 4 - recursiv, prin continuarea parcurgerii din ultimul nod atins - nerecursiv, daca lista de asteptare e o => urmatorul nod scos pentru prelucrare e ultimul nod atins - daca lista de asteptare e o => elementele sunt scoase in ordinea in care au fost introduse => parcurgere in ordinea distantei (nr muchii) de nodul initial (ex pt drumul cel mai scurt ca nr de muchii traversate) Cautarea in adancime corespunde traversarii in preordine Cautarea prin cuprindere corespunde traversarii pe nivele - nu e necesara marcarea nodurilor, arborii sunt grafuri Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 5 implicit, obiectele declarate la nivel de fisier sunt intr-un program (doua declaratii ale aceluiasi identificator in fisiere diferite reprezinta , v curs 4) => obiectul va fi intr-un singur fisier, in toate fisierele ce-l utilizeaza Declaratii care nu sunt definitii: - pentru variabile: cu specificatorul - pentru functii, doar prototipul (antetul), nu si corpul functiei Fazele compilarii: - compilarea in fisiere c -> o (cod masina, dar contine inca nume de variabile in loc de adrese fixe) - editarea de legaturi (linkeditarea): referintele la un identificator ( ) din toate fisierele obiect inlocuite prin aceeasi adresa Obiectele cu specificatorul nu sunt vizibile in afara fisierului => acelasi identificator poate fi refolosit pentru obiecte diferite Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 6 - cate un fisier pentru portiunile de cod care formeaza o entitate logica - cu un minim de interactiune (fara variabile globale nenecesare, etc ) - declaratiile de tipuri, functii si variabile ce trebuie exportate se pun intr-un fisier antet h - acesta e inclus de fiecare fisier c care il necesita - pentru a nu include declara in duplicat, se poate incadra in #ifndef FiSiERULMEU H #define FiSiERULMEU H  * aici vine continutul propriu-zis *  #endif Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 7 TDA = un model matematic cu un set de operatii asupra lui => o structura de date + functii care opereaza pe ea => notiunea de din programarea orientata pe obiecte Pentru implementarea TDA in C: - tipul de date (ex structura) e ascuns in partea de implementare -in fisierul h se declara doar un typedef pt tip (sau pointer) typedef struct node {  * in fisierul c cu implementarea *  int info;  * sau si alte campuri *  struct node *nxt; } node t; typedef struct node *list;  * in fisierul h *  - utilizatorul, care include doar fisierul h nu are acces la structura interna a tipului; accesul e permis doar prin functii care citesc modifica componentele unei variabile de acest tip (ca si pt file) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 8 Decizii de proiectare: - ce operatii ca fie incluse - daca se transmit obiecte sau doar pointeri la obiecte (pointerii sunt necesari pentru functii care modifica obiectul) - daca rezultatul unei operatii e returnat (eventual alocat dinamic), sau depus intr-un obiect specificat (deja alocat) transmis ca parametru - daca functia returneaza un obiect, sau un cod de succes eroare (si obiectul e depus la adresa data de un pointer parametru) Vezi exemple de cod pentru: - cozi si stive implementate cu tablouri sau liste - multimi (de intregi), memorate cu cate un bit pe element Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 11 mai 2004 Utilizarea si programarea calculatoarelor Curs 18 Marius Minea in gasirea unei cai din cursul anterior, se continua imediat cautarea din ultimul nod atins, adica parcurgerea cat mai ‘in adancime" a grafului Exista insa si alte strategii de parcurgere O procedura generala de parcurgere a tuturor nodurilor: procedura parcurge n = nod initial marcheaza n ca atins introdu n in lista de asteptare cat timp lista de asteptare nu e vida scoate un nod n din lista viziteaza n  * fa prelucrarea necesara *  pentru fiecare succesor s al lui n daca s n-a fost atins marcheaza s ca atins introdu s in lista de asteptare Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 3 Cautare in grafuri Compilare separata unui nod e necesara pentru evitarea - e necesar un camp corespunzator in structura de date a nodului (sau un tablou auxiliar de fanioane, indexat dupa noduri) - pe parcursul algoritmului, un nod se afla succesiv in : - nemarcat: nu a fost inca decoperit (atins) de algoritm - marcat, in lista de asteptare: descoperit dar neprelucrat - marcat, iesit din lista de asteptare: prelucrare incheiata campul "marcat" poate avea valori in functie de scopul parcurgerii (ex culoarea nodului, predecesorul in parcurgere, etc ) intr-un graf neconectat, se repeta pentru fiecare componenta conexa Obs: pentru mai multe prelucrari repetate ale grafului, el trebuie din nou "demarcat" (acelasi algoritm, cu inversarea sensului marcarii) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea - recursiv, prin continuarea parcurgerii din ultimul nod atins - nerecursiv, daca lista de asteptare e o => urmatorul nod scos pentru prelucrare e ultimul nod atins - daca lista de asteptare e o => elementele sunt scoase in ordinea in care au fost introduse => parcurgere in ordinea distantei (nr muchii) de nodul initial (ex pt drumul cel mai scurt ca nr de muchii traversate) Cautarea in adancime corespunde traversarii in preordine Cautarea prin cuprindere corespunde traversarii pe nivele - nu e necesara marcarea nodurilor, arborii sunt grafuri Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata 5 Cautare in grafuri Compilare separata intr-un program diferite reprezinta in toate fisierele implicit, obiectele declarate la nivel de fisier sunt (doua declaratii ale aceluiasi identificator in fisiere , v curs 4) => obiectul va fi intr-un singur fisier, ce-l utilizeaza Declaratii care nu sunt definitii: - pentru variabile: cu specificatorul - pentru functii, doar prototipul (antetul), nu si corpul functiei Fazele compilarii: - compilarea in fisiere c -> o (cod masina, dar contine inca nume de variabile in loc de adrese fixe) - editarea de legaturi (linkeditarea): referintele la un identificator ( ) din toate fisierele obiect inlocuite prin aceeasi adresa Obiectele cu specificatorul nu sunt vizibile in afara fisierului => acelasi identificator poate fi refolosit pentru obiecte diferite Utilizarea si programarea calculatoarelor Curs 18 Marius Minea - cate un fisier pentru portiunile de cod care formeaza o entitate logica - cu un minim de interactiune (fara variabile globale nenecesare, etc ) - declaratiile de tipuri, functii si variabile ce trebuie exportate se pun intr-un fisier antet h - acesta e inclus de fiecare fisier c care Ti necesita - pentru a nu include declara in duplicat, se poate incadra in #ifndef "FiSiERULMEU H #define "FiSiERULMEU H  * aici vine continutul propriu-zis *  Sendif Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Cautare in grafuri Compilare separata Cautare in grafuri Compilare separata TDA = un model matematic cu un set de operatii asupra lui => o structura de date + functii care opereaza pe ea => notiunea de din programarea orientata pe obiecte Pentru implementarea TDA in C: -tipul de date (ex structura) e ascuns in partea de implementare -in fisierul h se declara doar un typedef pt tip (sau pointer) typedef struct node {  * in fisierul c cu implementarea *  int info;  * sau si alte campuri *  struct node *nxt; } node t; typedef struct node *list;  * in fisierul h *  - utilizatorul, care include doar fisierul h nu are acces la structura interna a tipului; accesul e permis doar prin functii care citesc modifica componentele unei variabile de acest tip (ca si pt file) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Decizii de proiectare: - ce operatii ca fie incluse - daca se transmit obiecte sau doar pointeri la obiecte (pointeri! sunt necesari pentru functii care modifica obiectul) - daca rezultatul unei operatii e returnat (eventual alocat dinamic), sau depus intr-un obiect specificat (deja alocat) transmis ca parametru - daca functia returneaza un obiect, sau un cod de succes eroare (si obiectul e depus la adresa data de un pointer parametru) Vezi exemple de cod pentru: - cozi si stive implementate cu tablouri sau liste - multimi (de intregi), memorate cu cate un bit pe element Utilizarea si programarea calculatoarelor Curs 18 Marius Minea 17 mai Utilizarea si programarea calculatoarelor Curs 19 i 2005 Marius Minea Tipuri de date abstracte Stive Cozi 2 1 Definim un tip structura pentru un element din lista struct nod { int num; char sir ; char *ps;    etc : informatia utila struct nod *next;    pointer la nodul urmator Putem sa declaram acum noduri de lista sau pointeri la ei: struct nod nodl; struct nod *head; la Pentru a nu scrie tot timpul struct nod definim un sinonim: typedef struct nod node t;    node t e sinonim cu struct nod sau direct in declaratia dinainte: typedef struct nod {    aici vine informatia utila din nod struct nod *next;    pointer la nodul urmator } node t; Putem declara acum: node t nodl; node t *head; Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 3 2 Pentru a crea un nou element din lista, il alocam dinamic: node t *n; n = malloc(sizeof(node t)) ; 3 Completam apoi informatia utila din nod: int x; scanf ("Xd", &x); n->num = x; Pentru un camp tablou (sir) de caractere, acesta trebuie copiat: char s ; scanf("%19s", s) ; strcpy(n->sir, s) ; Pentru un camp pointer (de ex la sir), acesta trebuie intai alocat! char s ; scanf("%19s", s); n->ps = malloc(strlen(s)+l); strcpy(n->ps, s); 4 Legam apoi nodul la locul dorit in lista initial, node t *head = NULL; a) pentru inserare in capul listei, legam n->next = head; head = n; b) pentru inserare in coada listei, memoram si node t *tail; si legam: if (head==NULL) head = n; else tail->next = n; si apoi tail = n; (daca lista e goala, noul element devine capul listei, altfel e inserat dupa vechea coada; in orice caz, el devine noua coada) Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 4 • tip de date: multimea valorilor pe care le poate lua o variabila - fiecare tip de date are definiti anumiti operatori • functiile   procedurile pot fi vazute ca o extindere a operatorilor Ex : concatenarea a doua siruri; inmultirea a doua matrici (exista chiar ca operatori in limbaje mai bogate in tipuri) • tip de date abstract: un model matematic + operatii pe acel model Ex : tipul multime (cu test de membru, reuniune, intersectie) • structura de date: colectie de variabile (posibil de tipuri diferite), pentru implementarea tipurilor de date abstracte intr-un program Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 5 - o lista (un sir) in care elementele sunt adaugate si extrase la acelasi capat, in ordinea inversa introducerii (LiFO - last in, first out) - denumire inspirata din realitate (ex o stiva de carti) • stiva create()  * creaza o stiva noua *  • empty(stiva)  * testeaza daca stiva e goala *  • push(stiva, element)  * pune pe stiva *   * pop si top necesita ca preconditie o stiva nevida *  • pop(stiva) : element  * extrage si returneaza varful stivei *  • top(stiva) : element  * returneaza varful stivei *  • full(stiva)  * testeaza daca stiva e plina *  Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 6 #define MAX 100    dimensiunea maxima a stivei typedef int elem t;    sau alt tip de element dorit typedef struct stk { elem t t [MAX]; int sp;    indicele stivei } stack;    tipul stack e o structura stack *create(void) { stack s = malloc(sizeof(struct stk)); if (s) s->sp = 0; return s; }    empty, full si top nu modifica stiva, push si pop da int empty(const stack *s) { return s->sp == 0; } int full(const stack *s) { return s->sp == MAX; } void push(stack *s, elem t e) { if (sp t [s->sp++] = e; } elem t pop(stack *s) { return s->sp ? s->t[—s->sp] : 0; } elem t top(const stack *s) { return s->sp ? s->t[s->sp-l] : 0; } Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 7 typedef int elem t; typedef struct stk { elem t *base, *sp, *lim; } stack; stack *create(void) { stack s = malloc(sizeof(struct stk)); if (s) { s->base = s->sp = s->lim = NULL; } return s; } int empty(stack *s) { return s->sp == s->base; } int full(stack *s) { return 0; } void push(stack *s, elem t e) { if (s->sp == s->lim) { elem t *p=realloc()*s)->base,(s->sp-s->base+64)*sizeof(elem t)); if (!p) return;  * eroare, memorie insuficienta *  s->sp += p-s->base; s->lim += (p-s->base) + 64; s->base = p; *s->sp++ = e; elem t pop(stack *s) { return (s->sp!=s->base) ? *—s->sp : 0; } elem t top(stack *s) { return (s->sp!=s->base) ? *(s->sp-l) : 0; } Obs: toate functiile in afara de emptyO si top() modifica stiva, deci e necesara transmiterea unui la stiva ! Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 8 Varianta cu dealocarea memoriei in pop() (simetric cu push): elem t pop(stack *s) { elem t e = (s->sp!=s->base) ? *—s->sp : 0; if (s->lim - s->sp == 64) {  * limita pt dealocare *  p = realloc(s->base, (s->sp-s->base)*sizeof(elem t)); s->lim = s->sp += p - s->base; s->base = p; return e; Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 9 == faptul ca detaliile de implementare sunt ascunse de utilizator Pentru folosirea stivei, ar fi suficient un fisier cu typedef int elem t;    trebuie specificat tipul elementului typedef struct stk *stack;    declaratie incompleta de tip    stack e tip pointer la o structura neprecizata inca stack create(void); int empty(stack s); int full(stack s); void push(stack *s, elem t e);    aici, push si pop au param, pointer elem t pop(stack *s);    pentru ca modifica stiva elem t top(stack s); implementarea: intr-un fisier stiva c invizibil utilizatorului — acesta poate fi compilat separat - si apoi linkeditat cu programul principal care utilizeaza stiva Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 10 Pentru apelurile de functii, calculatorul foloseste o Procesorul are un registru special in care tine minte varful stivei Cand apelam o functie, pe stiva se pun urmatoarele: - argumentele (parametrii) functiei (de regula primul cel mai sus) - adresa instruct de dupa apel (unde se revine la terminarea functiei) - apoi pe stiva se creeaza variabilele locale (ele dispar la revenirea din functie => nu e corecta returnarea adresei unei variabile locale Din functii se revine in ordine inversa in care au fost apelate (daca f apeleaza pe g si aceasta pe h, se revine din h, apoi din g, si f) => stiva este foarte naturala pentru implementare Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 11 - o lista (un sir) in care inserarea se face la un capat, si extragerea la celalalt, in ordinea introducerii elementelor (FiFO = first in, first out) • init(coada)  * initializeaza coada *  • empty(coada)  * testeaza daca coada e goala *  • enqueue(coada, element)  * adauga la coada, daca nu e plina *  • dequeue(coada) : element  * extrage din coada nevida *  • full(coada)  * testeaza daca coada e plina *  Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tipuri de date abstracte Stive Cozi 12 #define MAX 100  * dimensiunea maxima a cozii *  typedef int elem t  * sau orice alt tip dorit *  typedef struct { elem t t [MAX]; int head, tail;  * inserare la tail, extragere de la head *  } queue; void init(queue *q) { q->tail = q->head = 0; } int empty(queue *q) { return q->head==q->tail; } void enqueue(queue *q, elem t e) { if (((q->tail+l)70MAX) == q->head) return;  * coada plina *  q->t [q->tail++] = e; q->tail 70= MAX; elem t dequeue(queue *q) { if (q->head==q->tail) return 0;  * coada vida *  { elem t e = q->t [q->head++] ; q->head 70= MAX; return e; } int full (queue *q) { return ((q->tail+l)70MAX) == q->head; } Utilizarea si programarea calculatoarelor Curs 19 Marius Minea Tabele de dispersie Programare modulara 25 mai 2004 in proiectarea unui program, structurile de date trebuie sa permita regasirea rapida a obiectelor utilizate, pentru prelucrare eficienta -in tablou: acces direct la elemente daca stim indicele - sau: structuri cu elemente legate prin pointeri (inlantuite) (ex in grafuri, legaturi intre nodurile si muchiile corespunzatoare) Problema: Accesul la obiecte referite din exteriorul programului prin nume (identificator nenumeric => nu poate fi folosit direct pentru indexare) Ex : graf cu orase; utilizatorul introduce un nume (nu numar) de nod Ex : la compilare, regasirea inregistrarii unei variabile (la intalnirea ei) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara 3 Tabele de dispersie Programare modulara - cautare secventiala (in tablou sau lista) dureaza proportional cu numarul de elemente - cautare binara (intr-un tablou) dureaza logaritmic, dar structura trebuie mentinuta sortata (pe ansamblu, efort similar daca sunt multe inserari   stergeri) - cautare in arbore binar (tot logaritmic) dar arborele trebuie mentinut relativ echilibrat - structura complicata ideal ar fi regasirea (accesul) in timp practic constant - ideea: gasirea unei functii h cu o valoare numerica unica pentru fiecare obiect considerat, intr-un domeniu restrans (utilizabil ca indice) => memoram fiecare obiect x intr-un tablou la indicele h(x) Tehnica se numeste ( ): obiectele cu care lucram sunt dispersate intr-un tablou ( ) - matematic: o functie (partiala) h : D V, unde D e domeniul tuturor obiectelor posibile, iar domeniul de valori (indici) V e 0,1, ,ЛГ-1 - ex la compilare: D e multimea tuturor identificatorilor - practic, | >| >> |V'| (ca dimensiune) deci h nu poate fi injectiva pe D - dar avem nevoie de valori distincte doar pt submultimea obiectelor efectiv utilizate Du C D (ex identificatorii dintr-un program C dat) ale functiilor de dispersie - sa fie rapid calculabile (pentru eficienta) - sa aiba o distributie de valori cat mai uniforma, pentru a minimiza probabilitatea de coliziune (valori egale pentru obiecte diferite Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara Tabele de dispersie Programare modulara Cel mai frecvent caz: functii pentru siruri de caractere -se calculeaza cu (aproape) toate caracterele (deosebire cat mai buna) - cu deplasari frecvente pe biti pentru a "amesteca" valoarea obtinuta Exemple concrete (sirul char *s; se parcurge secvential): for (h=len; len—;) h = ((h"7) * (h"27)) * *s++;  * Knuth *  for (h=538i; c=*s++; ) h += (h " 5) + c;  * Bernstein *  for (h=0; c=*s++; ) h = (h"6) + (h"16) - h + c;  * SDBM *  La sfarsit, valoarea e luata modulo dimensiunea tabloului: h 7,= N Pentru alte tipuri de obiecte: se pot face calcule cu octetii obiectului grupati cate 4 (sau 2) si interpretati ca intregi Si functiile bune au (valori egale pt obiecte diferite) => trebuie rezolvate (dezambiguate) pentru a permite regasirea corecta (closed hashing) - daca la indicele idx=h(x) se gaseste alt obiect y, se cauta succesiv dupa o anumita regula: secvential (idx++), liniar (idx+=i), cu a doua functie (idx+=h2(x)), pana se gaseste obiectul sau o intrare vida - nu pot contine mai multe obiecte decat dimensiunea tabloului => la depasire, obiectele trebuie redistribuite intr-un tablou mai mare - la stergere, intrarea in tablou trebuie marcata "sters", nu "vid", pentru a permite cautarea corecta (pana la gasire sau "vid") (open hashing) - o intrare in tablou: de obiecte cu aceeasi valoare pentru h => hashing + cautare liniara in lista (scurta pentru functii bune) - necesita alocare dinamica pentru elementele listei (v exemplu) - tabloul poate fi mai mic decat nr de obiecte, dar se recomanda sa fie comparabil (pt a avea nr mic de obiecte cu aceleasi valori pt h) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Utilizarea si programarea calculatoarelor Curs 18 Marius Minea 25 mai Utilizarea si programarea calculatoarelor Curs 18 2004 Marius Minea Tabele de dispersie Programare modulara 2 in proiectarea unui program, structurile de date trebuie sa permita regasirea rapida a obiectelor utilizate, pentru prelucrare eficienta -in tablou: acces direct la elemente daca stim indicele - sau: structuri cu elemente legate prin pointeri (inlantuite) (ex in grafuri, legaturi intre nodurile si muchiile corespunzatoare) Problema: Accesul la obiecte referite din exteriorul programului prin nume (identificator nenumeric => nu poate fi folosit direct pentru indexare) Ex : graf cu orase; utilizatorul introduce un nume (nu numar) de nod Ex : la compilare, regasirea inregistrarii unei variabile (la intalnirea ei) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara 3 - cautare secventiala (in tablou sau lista) dureaza proportional cu numarul de elemente - cautare binara (intr-un tablou) dureaza logaritmic, dar structura trebuie mentinuta sortata (pe ansamblu, efort similar daca sunt multe inserari   stergeri) - cautare in arbore binar (tot logaritmic) dar arborele trebuie mentinut relativ echilibrat - structura complicata ideal ar fi regasirea (accesul) in timp practic constant - ideea: gasirea unei functii h cu o valoare numerica unica pentru fiecare obiect considerat, intr-un domeniu restrans (utilizabil ca indice) => memoram fiecare obiect x intr-un tablou la indicele h(x) Tehnica se numeste ( ): obiectele cu care lucram sunt dispersate intr-un tablou ( ) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara 4 - matematic: o functie (partiala) h : D V, unde D e domeniul tuturor obiectelor posibile, iar domeniul de valori (indici) V e 0,1, , N — 1 - ex la compilare: D e multimea tuturor identificatorilor - practic,  D  " |V| (ca dimensiune) deci h nu poate fi injectiva pe D - dar avem nevoie de valori distincte doar pt submultimea obiectelor efectiv utilizate Du C D (ex identificatorii dintr-un program C dat) ale functiilor de dispersie - sa fie rapid calculabile (pentru eficienta) - sa aiba o distributie de valori cat mai uniforma, pentru a minimiza probabilitatea de coliziune (valori egale pentru obiecte diferite Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara 5 Cel mai frecvent caz: functii pentru siruri de caractere -se calculeaza cu (aproape) toate caracterele (deosebire cat mai buna) - cu deplasari frecvente pe biti pentru a "amesteca" valoarea obtinuta Exemple concrete (sirul char *s; se parcurge secvential): for (h=len; len—;) h = ((h trebuie rezolvate (dezambiguate) pentru a permite regasirea corecta Utilizarea si programarea calculatoarelor Curs 18 Marius Minea Tabele de dispersie Programare modulara 6 (closed hashing) - daca la indicele idx=h(x) se gaseste alt obiect y, se cauta succesiv dupa o anumita regula: secvential (idx++), liniar (idx+=i), cu a doua functie (idx+=h2(x)), pana se gaseste obiectul sau o intrare vida - nu pot contine mai multe obiecte decat dimensiunea tabloului => la depasire, obiectele trebuie redistribuite intr-un tablou mai mare - la stergere, intrarea in tablou trebuie marcata "sters", nu "vid", pentru a permite cautarea corecta (pana la gasire sau "vid") (open hashing) - o intrare in tablou: de obiecte cu aceeasi valoare pentru h => hashing + cautare liniara in lista (scurta pentru functii bune) - necesita alocare dinamica pentru elementele listei (v exemplu) - tabloul poate fi mai mic decat nr de obiecte, dar se recomanda sa fie comparabil (pt a avea nr mic de obiecte cu aceleasi valori pt h) Utilizarea si programarea calculatoarelor Curs 18 Marius Minea 31 mai Utilizarea si programarea calculatoarelor Curs 20 i 2004 Marius Minea Cautare cu revenire 2 Pentru a rezolva o problema: - trebuie sa alegem ideea potrivita de solutionare (algoritmul) - si structurile de date corespunzatoare pe care opereaza acesta Cateva tehnici generale: - Cautarea cu revenire (backtracking) cand solutia e gasita doar prin incercarea tuturor solutiile posibile - Metoda greedy cand solutia se obtine facand la fiecare pas mutarea "cea mai buna" - Descompunerea in subprobleme (divide and conquer) prin impartirea problemei in mai multe probleme similare, mai mici - Programare dinamica tot prin subprobleme mai mici, dar de regula cu portiuni comune Obs Recursivitatea e un element la nivel mai fundamental fata de metodele de mai sus (toate acestea se pot implementa recursiv) Utilizarea si programarea calculatoarelor Curs 20 Marius Minea Cautare cu revenire 3 Avem nevoie de: - o procedura recursiva de cautare - o structura de date (globala) in care memoram solutia procedura cauta(s: solutie) daca solutia e buna, tipareste; return pentru fiecare continuare posibila adauga continuarea la solutie cauta(solutie completata) sterge continuarea din solutie Obs: Aceasta varianta ne afiseaza toate solutiile Pentru a nu afisa decat una, modificam procedura sa returneze daca o solutie s-a gasit sau nu, si intrerupem ciclul de cautare daca da Utilizarea si programarea calculatoarelor Curs 20 Marius Minea Cautare cu revenire 4 Sa se coloreze un graf cu un numar dat de culori, cu noduri adiacente colorate diferit procedura cauta() daca toate nodurile colorate, afiseaza; return; alege un nod necolorat n pentru fiecare culoare c coloreaza n cu c daca nu exista vecin colorat cu c, cauta() decoloreaza n Utilizarea si programarea calculatoarelor Curs 20 Marius Minea Cautare cu revenire 5 Afisati, in ordine crescatoare, toate permutarile numerelor de la 1 la n Solutie: Pentru fiecare pozitie, de la 1 la n, alegem pe rand fiecare din numerele nealese inca, si continuam cu pozitia urmatoare procedura alege(p: pozitie) daca p > n tipareste permutarea pentru i de la 1 la n daca i e liber pune i pe pozitia p marcheaza i ca ales alege(p + 1) marcheaza i ca liber Utilizarea si programarea calculatoarelor Curs 20 Marius Minea Cautare cu revenire intr-un graf (ordonat), exista un di Solutie: Parcurgem, cu revenire, g f sau epuizam drumurile Ne oprin drum: sir de noduri X procedura cauta(n: nod) daca n e f, tipareste drumul cu pentru fiecare succesor и al iu daca и nu e pe drumul curent adauga и la drumul curent cauta(w) scoate и din drumul curent program principal initializeaza drumul cu s X cauta(s) Utilizarea si programarea calculatoarelor Curs 20 6 rum de la un nod s la un nod f ? iraful incepand de la s, pana gasim i din drum cand inchidem un ciclu irent; stop i n Marius Minea Abstract interpretation x What is it, intuitively? x Relationship to dataflow analysis Value ranges Fixpoints and infinite lattices x Dataflow problems with infinite lattices x Widening x Narrowing Two approaches to generating correct analyses x Representation functions x Correciness relations -p 1 25 "Execute the program on an abstract program state x Just like writing an interpreter, but x Abstract program state represents all possible program States at a particular program point x Covers all possible program inputs What to do for multiple incoming control-flow edges? Join! What to do for program loops? iterate! - p 2 25 Abstract interpretation is a dataflow analysis x A different way to construct correct analyses x induces a specific ordering on the "worklist" Abstract program States are typically complete lattices x Trivial join lattice for any domain V with values vi, v2, • • • ,vn eV implies an abstract interpretation x Will permit lattices with infinite height x Can combine multiple analyses into a single lattice Trivial example: constant propagation - p 3 25 Start with the values in domain V you are interested in Example: The integers Next, consider the operations that can be performed on values in V, e g , +, *,   For t>i, v2 e V we say that г>і — v2 if the value vi can be transformed to r2 Determine the form of the elements in the lattice L Construct the operations performed on the elements of the lattice L For Zb Z2 e V we say that Zi > Z2 if the lattice element Zi can be transformed to Z2 - p 4 25 What does the for constant propagation involve? - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers What then does o involve? - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers What then does o involve? Negation, addition, subtraction, multiplication, etc , of elements in the lattice - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers What then does o involve? Negation, addition, subtraction, multiplication, etc , of elements in the lattice For negation, the following hold: -(T) > T - (±) > ± - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers What then does o involve? Negation, addition, subtraction, multiplication, etc , of elements in the lattice For negation, the following hold: -(T) > T - (±) > ± Binary operations will have, e g , Zi x Z2 > Z3 - p 5 25 What does the — for constant propagation involve? Negation, addition, subtraction, multiplication, etc , of integers What then does o involve? Negation, addition, subtraction, multiplication, etc , of elements in the lattice For negation, the following hold: -(T) > T - (±) > ± Binary operations will have, e g , h x Z2 > l3 x What would + look like? - p 5 25 Constant propagation is boring: we can do better Definition: A value range, denoted [a : b], represents all values x such that: а E Z U {—00} b E Z U {00} а a2 A &i A bi A bi A bi A bi A bi  -b : -a Addition ai i b[ И- g>2 • ^2 H- • b^ И- 62 Subtraction: [ai : bi] - > ["1 - b2 : bj - 0,2 Multiplication: [ai : bi  • [a2 : 62] > [min(aia2, ai&2, ^1^2, ^1^2)   max(aia2, ai&2, ^1^2, &1&2) Key points to revisit later: x We know how to map from elements (integers) in V to elements (value ranges) in L x We can prove that the operations on elements of V are "abstracted" by the operations on elements on L important relationship between -   and o But now, let’s try some abstract interpretation - p 9 25 Example: Try it with constant propagation lattice x Not much of an improvement Example: Try it with value range lattice x Start at entry node - p 10 25 Example: Try it with constant propagation lattice x Not much of an improvement Example: Try it with value range lattice x Start at entry node x Apply u at control-flow joins - p 10 25 Example: Try it with constant propagation lattice x Not much of an improvement Example: Try it with value range lattice x Start at entry node x Apply u at control-flow joins x Apply > for each operation - p 10 25 Example: Try it with constant propagation lattice x Not much of an improvement Example: Try it with value range lattice x Start at entry node x Apply u at control-flow joins x Apply > for each operation x Note: introducing improves analysis - p 10 25 Example: Try it with constant propagation lattice x Not much of an improvement Example: Try it with value range lattice x Start at entry node x Apply u at control-flow joins x Apply > for each operation x Note: introducing improves analysis - p 10 25 x What do we do at node 2? Join with ± (as in dataflow analysis) - p 11 25 2 if (x b2 otherwise - p 12 25 - p 13 25 2 if (x b2 if bi How do we prove that our analysis is correct? x Representation functions x Correciness relations Both methods are equivalent - p 20 25 Let (3 : V L be a function that maps any value in V to its "best" representation in L Your analysis is correct if the following is true: Д(^1) E h    Vi — V2 A 11 > І2 =>  3(Ѵ2) E І2 intuitively: if a value can be safely described by a lattice element, then any value it is transformed into can be safely described by the corresponding transformation on the lattice element Can we prove this for value ranges? - p 21 25 Let R : V x L {true, false} be a correciness relation Given v e V, l e L, v R l is true when v is described by l 1R[—1 : 2] =?, 7# =? General requirement: preservation of correciness Vi R 11 A Vi 1’2 A 11 > І2 => V2 R І2 Two more conditions for correciness when dealing with lattices: 1 Lattice preserves R  v R li A Zi □ l2 => v R l2 2 There is always a "best" approximation l for every v: (V  G L' C L- V Rl) ^Я(Пь') interesting consequence: v R li   v R vR(li n Z2) - p 22 25  Ne mainly talk about a lattice L for values of a single variable Can take the Cartesian product of several of these lattices to handle multiple variables: L — Li x Lq, x x Lyy Variables do not need to be of the same type: Li could be a value range lattice,     > a boolean lattice, and L3 a points-to graph lattice - p 23 25 You can read about Galois connections to abstract interpretation in the class text, but it will hurt We’ve only discussed forward semantics: you can do abstract interpretation backwards, and with meet lattices (everything is dual) We only handled the "trivial" case of widening on back edges x What to do about irreducible control-flow graphs? x So long as you pick widening edges such that every cycle contains at least one widening edge, abstract interpretation "works" x Bourdoncle studied these chaotic iteration strategies NP-complete problem, but with good heuristics - p 24 25 Constant propagation, dead-code elimination, etc: can propagate constants and determine when conditions evaluate true or false Array bounds analysis: detect bugs or remove checks that are known to be unnecessary Bit width estimation: limit the sizes of registers when performing hardware synthesis Static branch prediction: produce probabilities that particular branches will be taken - p 25 25 Exemplu: Formalizare si demonstratie prin rezolutie 23 noiembrie 2017 1 Formalizarea afirmatiilor Traducem in logica urmatorul exercitiu, luat din http:  www cs utexas edu users novak reso html 1 Fiecare investitor a cumparat actiuni sau obligatiuni 2 Daca indicele Dow Jones scade, toate actiunile mai putin aurul scad 3 Daca trezoreria creste dobanda, toate obligatiunile scad 4 Orice investitor care a cumparat ceva care scade nu e bucuros 5 Daca indicele Dow Jones scade si trezoreria creste dobanda, toti investitorii bucurosi au cumparat ceva actiuni de aur 1 Fiecare investitor a cumparat actiuni sau obligatiuni in logica predicatelor, variabilele reprezinta elemente arbitrare, de orice fel, din univers Nu putem spune "in aceasta formula, am ales X sa fie un investitor", fiindca X poate fi orice Pentru a reprezenta categorii (tipuri) de entitati, folosim predicate introducem predicatul inv(X) (X e investitor) Cuvintele "fiecare", "orice", "toti", etc introduc o variabila cuantificata universal: VX Oricum alegem X din univers, formula cuantificataa e adevaarataa X poate fi de orice fel (investitor, elev, casaa, numar, etc ), dar fraza e despre investitori, deci spune ceva despre X doar daca X ales e investitor Din acest motiv, cuantificatorul universal apare de regulaa cu implicatia: Pentru orice X, dacaa X e investitor, a faacut ceva VX inv(X) Ce stim despre investitor? A cumparat ceva, deci exista ceva ce investitorul a cumparat Cand definim un predicat, folosim argumentele in ordinea uzuala din propozfiie (subiect, apoi complement) Deci cump(X, Y): X a cumparat Y VX inv(X) > : Y cump(X, Y) Л Ap VX inv(X) 3Y cump(X,Y) Л (act(Y) V oblig(Y)) 2 Daca indicele Dow Jones scade, toate actiunile mai putin aurul scad indicele Dow Jones e o notiune unica, il reprezentam deci printr-o constanta dj Rafinam si aici succesiv: scade(dj) scade(dj) V X conditii pentru X scade(X) A2: scade(dj) VX act(X) Л —aur(X) scade(X) 3 Daca trezoreria creste dobanda, toate obligatiunile scad "Trezoreria creste dobanda" e o propozitie cu subiect, predicat si complement, deci sugereazaa introducerea unui predicat Dar propozfiia mai apare identic in fraza 5, iar verbul "creste" apare doar aici, deci nu mai sunt alte entitati care pot "creste" Deci putem reprezenta "trezoreria creste dobanda" ca o propozitie: ea e fie adevaarataa, fie falsaa, nu se referaa la vreo altaa variabilaa A3: crdob VX oblig(X) scade(X) ce se spune despre X ce ,stim despre Y ce se intampla 4 Orice investitor care a cumparat ceva care scade nu e bucuros VX inv(X) ce stim despre X VX inv(X) ( conditie pentru X —bucuros(X)) VX inv(X) (3 Y cump(X, Y) Л scade(Y)) — bucuros(X) Avem o structura cu doua implicatii, de forma A (B C) O vom intalni scrisa si fara paranteze, deoarece conventional, implicatia e asociativaa la dreapta Logica si structuri discrete Note de curs 1 Marius Minea Exemplu: Formalizare si demonstratie prin rezolutie 23 noiembrie 2017 Alternativ, stim ca p —> (g —> r) = -79 V (  (p A q) V r = (p A q) —> r, deci putem rescrie cu o conjunctie Reamintim ca e prioritara fata de —>, deci А Л В —> С = (А Л В) —> C Ap ^X inv(X) Л (3 Y cumptX, Y) Л scade(Y)) —> -’bucuros(X') Tautologia A —> (В —> С) = (А А В) —> C are o analogie si in programare: (А) (В) C; e echivalent cu (A && В) C; 5 Daca indicele Dow Jones scade si trezoreria creste dobanda, toti investitorii bucurosi au cumparat ceva actiuni de aur scadetdj) Л crdob —> ce se intampla scadetdj) Л crdob —> VX invtX) Л bucuros(X) —> ce stim despre X Ci scadetdj) Л crdob —> V X invtX) XbucurostX') —> 3 Y cumptX, У) A act(Y) Л aur(Y) 2 Aducerea la forma clauzala (forma normala conjunctiva) Am rescris formulele cu paranteze in loc de punct pentru a evita neintelegeri si greseli la cuantificatori Ap VX(inv(X) —> 3Y(curnp(X,Y) Л (act(Y) V oblig(Y)))) A-2i scade(dj') —> VX(act(X) Л -iaur(X') —> scadetX)) A31 crdob —> VXtobligtX) —> scadetX)) Ap VX(inv(X) Л 3 Y(cump(X, Y) Л scade(Y)) —> ^bucurostX)) Ci scadetdj) A crdob —> VXtinvtX) A bucur(X) —> 3 Y(cump(X, Y) A act(Y) A aurtY))) Pentru a demonstra prin reducere la absurd, aratam ca ipotezele A1-A4 impreuna cu negatia concluziei —>C duc la o contradictie Negam concluzia , inainte de a transforma cuantificatorii! — VXtinvtX) Л bucurtX) —> 3 YtcumptX, Y) Л act(Y) A aur(Y)))) Parcurgem aceiasi pasi ca si pentru formulele prepozitionale, cu pasi suplimentari specifici predicatelor la Eliminam implicatia A —> В = ->A V В, ->(A —> В) = A A ->B lb Ducem negatia inauntru pana la predicate -іѴжР(ж) = 3x  !inv(X)   'tfY(- 3Ytcump(X, У) Л act(Y) Л aw(Y)))) scadetdf) A crdob А VX(inv(X) A bucur(X)3Y(cump(X,Y) A act(Y) A aur(Y))) scade(dj) A crdob A 3 X tinvtX') A bucur(X) 3 Y(cump(X, У) A act(Y) A aurtY')')') scadetdf) A crdob A3X(io(i) AbucurtfX) A 3Y(cump(X,Y) A act(Y) A aur(Y))) scade(dj) A crdob A3X(io(X) AbucurtfX) ЛѴУ tcumptX,Y ) A act(Y) Л aur(Y))) scade(dj) А crdob Л ЗХ(іпѵ(Х) А bucur (X)   'tfY(- Ci Ap ѴА(-,тѵ(А) V (cump(X,Y ) A (act(Y) V oblig(Y)))) Y din 3 depinde de X Alegem o noua functie Skolem   inlocuim У cu  (A), iar ЗУ dispare: Ap ѴХ(- aplicata doar direct la predicate Avand variabile cu nume unic, cuantificatorii universali pot fi mutati in fata (intuitiv, putem "alege" valoarea variabilei de la inceput, nu conteaza daca mutam Vx si peste o subformula care nu depinde de x) Aceasta se numeste forma normala prenex Apoi eliminam cuantificatorii universali, considerandu-i impliciti Practic, pornind de la pasul anterior, putem sterge direct toti cuantificatorii universali din formule: Ap - cump(b, Y) V  ^act(Y ) V ->aur(Y) ->C 3 Demonstratia prin rezolutie Generam rezolventi pana la clauza vida Cautam clauze cu predicate opuse, P si -iP, unificam argumentele si adaugam rezolventul obtinut (restul clauzelor, substituit si reunit cu V) Euristici posibile sunt sa folosim clauze cu un singur literal, si sa eliminam pe rand predicatele, ca in calculul prepozitional (11) -iact(X) V aur(X) V scade(X) eliminam scadetdj) din (3, 6) (12) - mv(X) V act( (X)) V scade(J(Xf) eliminam oblig cu X12 =  (X) din (2, 12) in rezolvent putem obtine de doua ori acelasi predicat Putem simplifica, P(X) VP(X) = P(X), doar daca argumentele sunt aceleasi, de ex -iact(X) in rezolventul (15) sau scadetftX')') in (16) mai jos: (15) -icump(b,X) V -iact(X) V scade(X) eliminam aur cu Y = X din (10, 11) (16) ->mv(X) V scade(J(Xf) V - C nu e realizabila, si am demonstrat afirmatia pe care ne-am propus-o initial: Ад Л A2 Л A3 Л Ад —> C Logica si structuri discrete Note de curs 4 Marius Minea Compiling VHDL into a High-Level Synthesis Design Representation Petru Eles* Krzysztof Kuchcinski ' Zebo Peng^ Marius Minea* * Computer Science and Engineering Department Technical University of Timisoara Romania Abstract This paper presents an approach to use VHDL as input specification to the CAMAD high-level synthesis system in particular, it describes a synthesis-oriented compiler which takes a subset of VHDL as input and compiles it into the internai design representation of CAMAD, which can then be synthesized into register-transfer level design Since CAMAD supports the design of hardware with concurrency and asynchrony, our VHDL subset includes the concurrent features of the language We present also in the paper some important conclusions concerning how to deal with signals, wait statements, structured data, and subprograms 1 introduction VHDL is one of the most widely used languages in digital circuit design The existing iEEE standard defines a very rich language for hardware description and simulation However, the problem of extending the use of VHDL to the field of hardware synthesis does not have any definitive Solutions yet There are even discussions on to what extent VHDL is adequate as a synthesis language The main difficulty of using VHDL for synthesis is resulted from the simulation-oriented semantics of standard VHDL; some of the VHDL features cannot be synthesized and others can only be synthesized using very sophisticated hardware Therefore, it is quite obvious that a useful and efficient high-level synthesis system should accept only a subset of VHDL, possibly with some synthesis-oriented extensions which can be ignored when simulation is carried out in this paper we present a synthesis-oriented compiler based on a broad subset of VHDL We describe the language subset, the internai design representation based on an extended timed Petri net model, and the implementation This work has been partially sponsored by the Swedish National Board for industrial and Technical Development (NUTEK) t Dept of Computer and information Science Linkoping University Sweden of the compiler as part of the CAMAD high-level synthesis system developed at LinkOping University One of the main issues addressed in this paper is how to captare the VHDL semantics by our design representation Our approach differs from most of the other VHDL synthesis projects by considering a wider class of VHDL descriptions and dealing with synthesis of concurrent processes The paper is divided into six sections Section 2 discusses some previous work on VHDL synthesis and outlines our approach Section 3 presents the internai design representation used in our synthesis system Section 4 describes the compilation of the selected VHDL subset into the internai representation together with a discussion of some semantic aspects of the VHDL subset Finally section 5 deals with the implementation of the compiler and section 6 presents our conclusions 2 Background and Related Work The most difficult and widely discussed features of VHDL, from the point of view of synthesis, are the timing model, concurrency and synchronization, and subprograms However, most previous work makes use of only the sequential aspects of VHDL For example, in , Camposano assumes that the behavioral hardware specification captured by VHDL is sequential and only a synchronous hardware is synthesized Thus, sensitivity clauses of the wait statements are ignored and wait on signals is not allowed VSYNTH, the behavioral synthesis system described in , is another example of using a VHDL subset that is restricted to a purely sequential description As in Camposano’s work, the architecture body may only contain a single process No wait statement is accepted Bender and Stevens point out that a VHDL description is difficult to synthesize efficiently, mainly because of the low level synchronization and communication concepts based on signals To overcome 0-8186-2780-8 92 s3 00 © 1992 iEEE 604 this difficulty they replace the signal concept by several other concepts representing different synchronization and communication facilities By doing this, practically a new language with a different semantic is defincd in Postula describes SynVHDL, a subset of VHDL for high-level synthesis it is defined based on the assumption that a design will be described as a set of processes to be synthesized one at a time Each process will result in a separate synchronous hardware module in SynVHDL an architecture body contains a single process, and wait statements are allowed to have only a clock signal on their sensitivity lists No subprograms are allowed in Lis and Gajski propose a methodology to соре with the difficulties of VHDL synthesis Their approach differs in some way from those discussed above; instead of imposing restrictions directly on the language, they define four design models and recommend for each model a corresponding description style The compiler embodied in their system works in four different modes according to the corresponding VHDL descriptions The work discussed above usually integrates VHDL into a high-level synthesis environment by imposing restrictions on the language The authors try to solve the basic problem of interpreting VHDL’s semantice in the world of synthesis, by excluding some strictly simulation-oriented facilities from the language Their methods usually synthesize only one part of the VHDL description without considering its relation with the rest Most of these Systems restrict themselves to a practically sequential subset of VHDL, with a very restricted use of signals Our approach is to accept for synthesis a larger subset of standard VHDL, which we called S’VHDL When defining the subset we eliminate first of all facilities which are ambiguous (for instance those related to timing) or irrelevant (those connected to structural description, access types, etc ) from the point of view of high level synthesis The overall structure of a program in S’VHDL comprises entity dcclarations, architecture bodies, package declarations and package bodies with the following properties: • an architecture body may contain any number of concurrent statements; • scalar and composite types, with the exception of access and file types, are accepted; • signals can only be of scalar or bit-string type; • recursive calls are not allowed in procedures; • all sequential statements, with the exception of the assertion statements, are accepted; and - the structural aspects (such as component instantiation or generate statements) are excluded We have implemented a compiler which takes a digital system specification in S’VHDL and generates an internai design representation The compiler is designed as part of the CAMAD system, which synthesizes the internai representation into a register-transfer level design 3 The ETPN Design Representation The internai design representation of CAMAD is called ETPN (extended timed Petri net) , which has been developed to captura the intermediate results during the high level synthesis process The representation model is based on two separate but related parts: control and data part The representation uses Petri nets to provide a concurrent and asynchronous description of control The data path of the design representation is represented as a directed graph with nodes and arcs The nodes are used to captura data manipulation and storage units The arcs represent the connections of the nodes The control part of the design representation, on the other hand, is captured as a timed Petri net with restricted transition firing rules These two parts are related by the control signals coming from the control part to the data path, and the conditional signals traveling in the opposite dircction in the examples throughout the paper, data path nodes will be represented as rectangles with labels indicating the functions of the nodes or their names (if the node is a register) The arcs of the data path represent the data flow between function nodes Communication of data from one node to another is controlled by the control signals coming from the control part The control relation is indicated by using control state labels to guard arcs When a control state S, in the Petri net representing the control flow, holds a token, its associated arcs in the data path (arcs guarded by the corresponding labei) will be open for data to flow Control States or places of the control Petri net will be depicted in our examples as circles The transitions of control States are represented as firings of one or several transitions of the Petri net, which are depicted as bars To express that the control flow can be guarded by results of internai computations, we use conditional signals to guard the control flow A transition may be guarded by one or more conditions produced from the data path A transition may be fired when it is enabled (all its input places have a token) and the guarding condition is true if a transition has more than one guarding condition and at least one of them is true, the transition’s guarding condition is true 4 Compilation of S’VHDL to ETPN in this section, we present the compilation of the selected 605 VHDL subset into the ETPN design representation by discussing how certain basic S’VHDL constructs can be represented in the ETPN model 4 1 Wait Statements and Signal Assignments in VHDL the wait statement can have three basic forrns Two of them, namely "wait for" and "wait until" are relatively easy to be represented in ETPN while "wait on" statement cause problems in this section we will concentrate mainly on "waiting on events and transactions" while other forrns will be briefly discussed later A wait statement on a signal in a S’VHDL process will result in suspending the process until an event on the specified signal occurs Such an event occurs when the signal changes its value as the result of an assignment statement in Figure 1, we show how the wait statement (a) and the signal assignment (c) are represented in ETPN Waiting for an event is solved by associating the condition C, to a transition in the waiting process The condition Cs will be produced as result of an assignment to the respective signal, if the value of the signal changes For reasons of simplicity we will use a compressed representation for signals (equivalent to that in Figure l(c)) as illustrated in Figure l(d) wait on s wait on s’transaction (a) wait on event (b) wait on transaction s ynchronization Process l 5 1 comparator 0 70 Process 2 5 1 comparator 0 70 Process 3 25 1 ALU, 1 decoder 14 27 Am 2901 Unrestricted 22 1 ALU, 1 adder, 1 decoder, 1 inverter 112 62 Elliptic filter Unrestricted 19 1 adder, 1 multiplier 10 83 Table 1: Summary of synthesis results of CAMAD show that our system performs well also with arithmetic dominated hardware, although the elliptic filter example consists of only one process Results reported in indicate 18 States by using three adders and one multiplier with a CPU time of 360 seconds in the elliptic filter is synthesized to 19 States, with three adders and one multiplier, in 107 seconds Finally, by synthesizing the Am 2901 four-bit micropro-cessor slice listed in Table 1, we demonstrate that CAMAD is able to synthesize VHDL specifications of standard com-mercial microprocessor structures with results that are similar to the original manual design 6 Conclusions This paper addresses one of the most difficult aspects in the hardware synthesis of behavioral VHDL specifications, namely synthesis of concurrent processes while preserving standard VHDL simulation semantics We first developed a model that allows a practically unrestricted use of signals and wait statements by producing a synchronous hardware with a global control of process synchronization for signal update The hardware can be controlled either by a single state machine or by a collection of FSMs working synchronously together With our second model we have shown that it is possible to relax the strong synchronization imposed by the VHDL simulation cycle without affecting the semantic correciness of the synthesized circuit S’VHDL descriptions written according to this style are synthesized to hardware with a higher degree of parallelism and asynchrony, without any need for additional global synchronization The results we report in the paper show that the CAMAD high-level synthesis system can efficiently handle ETPN design representations produced by the S’VHDL compiler including designs described as interacting concurrent processes according to the proposed models More research is needed, however, in the area of high-level specific transfor-mations applicable to concurrent processes and communication protocols References Bergamaschi, R A , Kuehlmann, A , A System for Production Use of High-Level Synthesis, iEEE Transactions on Very Large Scale integration (VLSi), voi 1, no 3, Sept 1993, pp 233-243 Biesenack, J , et al , The Siemens High-Level Synthesis System CALLAS, iEEE Transactions on Very Large Scale integration (VLSi), voi 1, no 3, Sept 1993, pp 244-253 Camposano, R , Saunders, L F and Tabet, R M , VHDL as input for High-Level Synthesis, iEEE Design and Test of Computers, March 1991, pp 43-49 Eles, P , Kuchcinski, K , Peng, Z , Minea, M , Compiling VHDL into a High-Level Synthesis Design Representation, Proc EURO-DAC EURO-VHDL’92, 1992, pp 604-609 Eles, P , Kuchcinski, K , Peng, Z , Minea, M , Two Methods for Synthesizing VHDL Concurrent Processes, Research Report, LiTH-iDA-R-93-22 Ecker, W , Using VHDL for HWiSW Co-Specification, Proc EURO-DAC EURO-VHDL’93,1993, pp 500-505 Harper, P , Krolikoski, S , Levia, O , Using VHDL as a Synthesis Language in the Honeywell VSYNTH System, in J A Darringer, F J Rammig (Editors), Computer Hardware Description Languages and their Applications, North Holland, 1990, pp 315-330 iEEE Standard VHDL Language Reference Manual, iEEE Std 1076-1987, iEEE Computer Soc Press, 1987 Miiller, J , Kramer, H , Analysis of Multi-Process VHDL Specifications with a Petri Net Model, in: Proc EURO-DA& EURO-VHDL’93 (1993), pp 474-479 Nagasamy, V , Berry, N , Dangelo, C , Specification, Planning, and Synthesis in a VHDL Design Environment, iEEE Design & Test of Computers, June 1992, pp 58-68 Paulin, P G , Knight, J P , Force-Directed Scheduling for the Behavioral Synthesis of ASiC’s, iEEE Transactions on Computer-Aided Design, voi 8, no 6, June 1989, pp 661 -679 Peng, Z , Kuchcinski K , Automated Transformation of Algorithms into Register-Transfer Level implementation, iEEE Transactions on Computer-Aided Design of integrated Circuits and Systems, voi 13, no 2, Feb 1994, pp 150-166 Postula, A , VHDL Specific issues in High Level Synthesis, Proc Euro-VHDL’91, 1991, pp 70-77 Ramachandran, L , Vahid, F , Narayan, S , Gajski, D , Semantics and Synthesis of Signals in Behavioral VHDL, Proc EURO-DAC EURO-VHDL’92, 1992, pp 616-621 Roy, J , Kumar, N , Dutta, R , Vemuri, R , DSS: A Distributed High-Level Synthesis System, iEEE Design & Test of Computers, June 1992, pp 18-32 Vemuri, R , Roy, J , Mamtora, P , Kumar, N , Benchmarksfor High Level Synthesis, Technical Memo-ECE-DDE-91-11, University of Cincinnati, 1991 545 A Formal Approach for Automated Reasoning about Off-Line and Undetectable On-Line Guessing (Short Paper) BogdanGroza andMariusMinea Politehnica University odTimisoaraand institute e-Austria TOiiitoara' bogdan grozaSaut upt ro, marius@cs upt ro Abstract Starting from algebraic properties that enable guessing low-entropy secrets, we formalize guessing rules for symbolic verification The rules are suited for both off-line and ondine guessing and can distinguish between them We add our guessing rules as state transitions to protocol models that are input to modelchecking lools Wl li isor proof-oC-contept implementation we have automaticallydetectedguessingattacksinsev-eral protocols Some attacks are especially significant since they are undetectable by protocol participante, as they cause no abnormal protocol behavior, a case not previously addressed by automated techniques 1 Motivation and Retated Work As password-based authenticntioo oogtinets lo beused in practice mol week passwords are still chosen by untco, dstecting protosnlcsuboect to gnostmg pt-tacks is a topic of high intcrostin socucity in tluc pajcgt tee m-0oss the pcoblici'i of formalizing a previousby introduci- approath to detect gucscin- oStaoks in a manner suitable for implementation in an automated verification toolset We use iF (intermediate Format), a specification language that can be handled by model checkers such as OFMC(Open Поигст iRcozipooT Model-Checker) and SATMC (SAT-based Model Checker) from the AViSPA toolset A previous intention of'integaatingauessing rules hi ORM° existe in ,which gives a formalization fot off-ling g'iessiiig' ateaoks 'h comparison, ourcon'ribn-tion proposes a diffreen1 formahsm (whli g^ratm- rnlot t sod on adiffeaent tive soning), which allows us lo lisitfon bodiosi-lmy aod effdiaa attaclse Oue gaataing rules are implemente0 ntthg 'sss-e of the роаосоІ і foii Un us^^, wil liaiif requiring the modihcation of tltr^ate-hd mode- checliere CHher concretempde-mentations of guessio- detectina oii'ee яо, by Сотіп et a> [7|,Lown | ifwho ueed Casper FDR and lllinclul [5[mProVarid а ѵегШег basedon Pro-oa tules Ous implementation is bastd on ||s, s s^'cdiomion knig-itoo' whiclt cm, be hennlled by several back-end modes -Іоекі'іП- iioliililhOf'W' a,,if gATMC, v '°i 'cfnia giicstiiig attocks °)l bot l Ь ^riiion eensidoringths lott'r nrlu(Uiel opeseinrs un stsmic ici-iis Fecttare predicates defmed nser [arias tueh ar іРпо'ші, state,centeins, eic Definition 2 We na-la e mAulic pro  wee dnscriptwe Pa triple esmpesed of an initial state, et set of іютг йое reda аіИ a ret of httackstotes, he , P = (JnitialState, TransiuwnRultp AttauhStxste* [, wnere: itrhe miri^stete in a conjunction of groond fetcis, it) a franeiiisn ruie has і-ie foem LHS ^nllld where LHS and RHSssrn eoniuaetions of faeSs, aHnePe mry alsocoelea a negated fact and a Given s G 5(T), denote gy O)f(-)e0e aracfo cctresgsnah^g ta t;hc funotinn ob-tained by making s a vsriablu ih 7’ anf keepins vtfierparin olUi-oaslaiil° e^ , 0сгурф,т) g отшз1е eorrenpynnmg to SOi'’ cryptir, m) Lemma 1 The symbolic protocol descriptionP is algebraicallydepennens ons, i e , P dep s, if an’onty ifany fundfon  s>etamod et Ot(-f whesn s G"S(Tr and Fihears T is strongly fagtingwehipg |none o|i hpv Lemma 1 relates a symbolic protocol description with the algebraic notion of strongly distinguishing frmction ^ncemiective frmdionc one atrongil distih-guishing in one qireeyasgy syi i ibola-ppotocol descriptfonm w|l|el a syinbol o os-curs only in the body of a injectfre (lrj Пг| 394 В Groza and M Minea with a key computed as a strongly distinguishing function on the secret, Controls the corresponding decryption Oracle, and can establish a relation to one or several parts of the encrypted messages We formalize this case as follows: Definition 5 We caii s-dependent an encryption or decryption Oracle that uses a key containing s An adversary that hears the encryption of some message with a key that contains s is saidto observe an s-dependent encryption Oracle Moreover, we say that he Controls the corresponding s-dependent decryption oracle if by replacing s in l's encryptionkeywith a fresh s'known to i'io the adversary can decrypt arbitrrry messoges encropted widh Sheorw ko v he, Hhears {M}K g 00 ' s ' observes(w;LMCce•)) r4) Fs sate') {M'}pe ih Heysios M '=h controiskOsM^ee 1 (•)) t M - ayp the encryption key must contain s, i e , Hhears {Л )е A К hpart s as a premise This is of course needed for the e|iiesiion of contrMling theoracietomalce sense To express a relation between encrypted inputs we employ a derivation rule Fact F^ncat T to produce all distinct meosages M that satisfy a ptopeety Fact(M), by concatenating them into ierni' ’ For example, (Hhears M) F^ncat T yields a term T that is the concatenation of all distinct terms for which Hhears M holds Similarly, (Hh^re 0c}k h К F an a) НУ, , e>cr p)iry wiso r key tWcontame s Also, let T hSpiit (Tpwr|denoin ehat T' andT" are det'ved by T into disjoint subsets of tesms hale leaty gnegstoam ypg-empty) The second guessin- rule ptorides [e coofol capabihtiesitohnelarelationber tween two terms (the rslates fact) the advnssaey ean os ' eny avrhabte o[>eratest: pair, crypt, etc , as wefi ct t^io l)olee-Ygo abihtisr, Vcet ars'rar, ote Tlins (or deciding relates the i>g' oo'W:iri,raie pesform sry teansition nl owed by ehe atfinvvic protocol descriptionP The'vllsvamg definii юс niodeln thiiintuition Definition 6 An adversary ean reiat e two te siriei'' and T" o So s yrnb eiir yrs^srs  description P if by rdding'0 'o nit vVvetsary O'iooleOge l c eon deriva T" (denoted Т' - >у('Р) T—usme оУ hisabilitietoven P O1" '  i e '"r r-elaaeser', iT") S) Lemma 3 Let P bea spmbolic protocol deewaoa'ion eecOtliai te Veos iiShe adversary observes oue or more s-^pe^^^^^ncryption oracles for which heoe she Controls the cofresponding decroption oracies and e;in ^^Ше paots olHie encrypted messagesShenine adversary can guess the secret, i e , observes(0{sM}K (•)) A controis(0{sM}k-1 (•)) ^оеса1Т ATHspiit {T' T"} Л relTe) => ^ess(s ) (7) Formal Automated Reasoning 395 3 implementation and Experimental Results Our formalization of the guessing calculus makes it amenable to an implementation where States are sets of terms, and transitions are given as rewrite rules, as in the iF protocol specification language Derivations such as Hhears , Fpart, T FSpiit yield corresponding iF facts These are combined into rules to establish the relations observes and Controls, and ultimately, guessing We use an adversary model with standard Dolev-Yao abilities: the adversary can fake new messages, inioie-opi seoi messages or overhear them Moreover,the adversary has the standard computational abilities: he can encrypt and decrypt if he knows the corresponding key ;nid hecan psiirancl decompose messages Based on this model wc-cpnl roexpdcss rulno for ghe sdverssry’n sbdity to observe and controp osactes To nrnide svlic hes n s-oiiipopal itici sepnsseiSd an oracle, we need to determine if ip dontaino thd secret tobegcressd Bp overhearing such a tesm, tge ;clic'sstith observestps oracle RirtFdrjtonlsride Controls, we start from terms eoni siiim sphe needet, constrpstppd t,etico in wliiel, the secret is replacedbfe ns gire dnccld' nnibdddnd int;o p]ed pudstmn nike Guessing multipipsndc•pts Todnoptdnudtsind m sucp scenascor,secretsal-readp guessed ninsi Anm1 ie pe bsdqi idig gucrscs Howcscr,dhis le gnd dffdctivd soluticg dpprdosds rg• ,, n^ine cnls lincd on ^-ео^сеи cnd Controls gbilitids) ;p i i riis lmnolllin protoeolhsdlf Arg recgh ;iny gucsced vglud is gdddd po ifcesd^ s ^mgpgotiooohmdcprvdcptijtTir cela erni m mrestpil in gnp propocol spceifidgtimr pgp t'nl0Sdschcln ins>' тиШр е gnnsnne Distinguishing detectable from undetectable nicslme a tacks As ofirst intuition, if gudssin mkde jc nee nftes g pgsticipgnt hgsrepshdel c fingl н сіі'1 thdn gudssing gods ||n^|cincd^(| top Ир1 pgigtcдpnep T|fc ie^S^incsn ^^d^romg;, se thd sgmd [>;n l ici[>;iifins  he Vd snn iк'e hmU-inee stih силш^ 'li НЫтупЫ! unddtdctgbld from ctetectgbte on-lme dndssin•i;• |i ckf we ndeh tpdppress dhnt gll pgrticipgnt iiislriicdv heivd surcdssfuUp sn n|itA^ to access remote printers, ffos ete and has three vessions: NTLM-d, NTLMv2 and NTLMv2-Session Figure 1 presents MS-CHAP v2 and NTLM v2-Session We have augmented the MS-CHAP v2 protocol model with guessing rules As expected, OFMC found the attack in Figure 1; a similar attack can be traced for NTLM The intruder acts as man-in-the-middle Guessing is possible because the intruder hears h(kab Na(3HAAild knowing Na(3) cancompute h(kab rpl Na(3y) for arbitrary replacements kab rpl of kab By Lemma 2 tins means that the intruder observes and stkksuk bhe orbuleO 0),whott f = hts-> NaU) The iwel three trace steps aus iiUntk'i^ ssnusning; sheyrsflecttht fact;tlial s ility doiebting flawsin such a protocol becomes cltu |bl|e thediscootty of new flcws m tLe overallayssem The Norwegian ATM A tw tukli a) and canperform guessing This is an off-lineateack and the trace represents hilrndor doductions: controlling the Oracle with replaced values (1), the intruder deduces both PiN and BKey (2), and thus thePiN(3) inprantice this isimoocsiOle becaure o^sm PiN would match for some DES key; moreover, only 16 bits of the result are stored, yielding a huge number of potential values for BKey and PiN To test our calculus, we restrictedthe adversary from trying onplasements for BKey Thus, the adversar1, no ivnger vrntrolsthe ou-ich ODESns  OFMC found thehtcony toace arbore theadveesaty, being iatued ale gal card (1) uses the PiN cliriiiyt'pro —   {scv, 7 A^В {0[Hb},Ra}fc o  1 -v рола S Flg 3 Tiie l iniat eo aopAotocol 398 В Groza and M Minea mainly an implementation issue) is just a random value (indistinguishable from a nonce) encrypted with the correct password of A Further on, the server answers in step 4 with {Nai1, k( > Na2'}pwdA, but now Nai1 is known to the adversary as he has forged the message in step 1 and thus he can make a correct guess This attack is not based on a replay, as the message in step 1 was never received by S before To the best of our knowledge, this attack is new 4 Conclusions We have formalizedrules for del esting guessing attacks,linking their underly-ing algebraic propertiesto ehe rolo d r sicurits protseolt analysis international Journal of information Security 7(1), 3-32 (2008) 3 Basin, D A , Modersheim, S , Viganb, L : OFMC: A symbolic model checker for security pr<>l<> ls 0ilorn;-i l Journal of information Security 4(3), 181-208 (2005) 4 Baudet, Mt Deciding security of protocols against off-line guessing attacks in: 12th ACM Conf on Computer and Communications Security, pp 16-25 (2005) 5 Blanchet, В : An EfflcipntEoypSogsaphic PpoincolVerifiec Based on PrologRules in: 14th iEEE Computer Secusity FoundoOions WorksOop, pp -0-96 (2000) 6 Corin, R , iiiiiihk-l C M , Etelle, S : An-St'-iiin pesswccd onsVocol eecurityagninst off-line dictionary atincks in: 6ndW> ennur ip-nns РеГгі Npln> pi- 47-61 (2004) 7 Corin, R , Malladi, S ,Alves-Fcss, J , ESsOe, St Guesnwhat? Heru is a newloeS that finds some new ,soi 62-tO(POnW) Formal Automated Reasoning 399 8 Ding, Y , Horster, P : Undetectable on-line password guessing attacks Operating Systems Review 29(4), 77-86 (1995) 9 Drielsma, P H , Modersheim, S , Vigand, L : A formalization of off-line guessing for security protocol analysis in: Baader, F , Voronkov, A (eds ) LPAR 2004 LNCS (LNAi), voi 3452, pp 363-379 Springer, Heidelberg (2005) 10 Groza, B , Minea, M : A calculus to defect guessing attacks in: Samarati, P , Yung, M , Martinelli, F , Ardagna, C A (eds ) iSC 2009 LNCS, voi 5735, pp 59-67 Springer, Heidelberg (2009) 11 Hole, K J , Moen, V , Klingsheim, A N , Tande, K MpLessons fromthe Norwegian ATM system iEEE Secprity wl Privacy 5(6),25-31 (2007) 12 Lomas, T M A , &),, Saltzer, J H , Nesdham, R M : Reducingrisks from poorly chosen keys in:12Sh ACM Symp on Орет Spp Princip , p, 15-15 (1P89) 13 Lowe, G : Analysing Rrotocols subj2ct to guessin) aStacks Joern l of Compu2er Security 12(1), 83098 (2004) Towards Formal Validation of Trust and Security in the internet of Services Roberto Carbone1, Marius Minea2, Sebastian Alexander Modersheim3, Serena Elisa Ponta4,5, Mathieu Turuani6, and Luca Viganb7 1 Security & Trust Unit, FBK, Trento, italy 2 institute e-Austria, Timisoara, Romania 3 DTU, Lyngby, Denmark 4 SAP Research, Mougins, France 6 DiST, Universita di Genova, italy 6 LORiA & iNRiA Nancy Grand Est, France ' Dipartimento di informatica, Universita di Verona, italy Abstract Service designers and developers, while striving to meet the requirements posed by application scenarios, have a hard time to assess the trust and security impact of an option, a minor change, a combination of functionalities, etc , due to the subtle and unforeseeable situations and behaviors that can arise from this panoply of choices This often results in the release of flawed producte to end-users This issue can be sig-nificantly mitigated by empowering designers and developers with tools that offer easy to use graphical interfaces and notations, while employ-ing established verification techniques to efflciently tackle industrial-size problems The formal verification of trust and security of the internet of Services will sigifificantly boost its development and public acceptance 1 introduction The vision of the internet of Services (loS) entails a major paradigm shift in the way iCT systems and applications are designed, implemented, deployed and consumed: they are no longer the result of programming componente in the traditional meaning but are built by composing Services that are distributed over the network and aggregated and consumed at run-time in a demand-driven, flex-ible way in the loS, Services are business functionalities that are designed and implemented by producere, deployed by providers, aggregated by intermediaries and used by consumers However, the new opportunities opened by the loS will only materialize if concepte, techniques and tools are provided to ensure security Deploying Services in future network infrastructures entails a wide range of trust and security issues, but solving them is extremely hard since making the service componente trustworthy is not suficient: composing Services leads to new, subtle and dangerous, vulnerabilities due to interference between component Services and policies, the shared communication layer, and application functionality Thus, one needs validation of both the service componente and their composition into secure service architectures J Domingue et al (Eds ): Future internet Assembly, LNCS 6656, pp 193-207, 2011 'C' The Author(s) This article is published with open access at SpringerLink com 194 R Carbone et al Standard validation technologies, however, do not provide automated sup-port for the discovery of important vulnerabilities and associated exploits that are already plaguing complex web-based security-sensitive applications, and thus severely affect the development of the future internet Moreover, security validation should be carried out at all phases of the service development process, in particular during the design phase by the service designers themselves or by security analysts that support them in their complex tasks, so as to prevent the production and consumption of already flawed Services Fortunately, a new generation of analyzers for automated security validation at design time has been recently put forth; this is important not just for the results these analyzers provide, but also because they represent a stepping stone for the development of similar tools for validation at service provision and consumption time, thereby significantly improving the all-round security of the loS in this chapter, we give a brief overview of the main scientific and industrial chal-lenges for such verification tools, and the Solutions they provide; we also discuss some concrete case studies and success stories, which provide proof of concept As an actual example, we discuss the main ideas and results of one such rigorous technology: the AVANTSSAR Validation Platform (or AVANTSSAR Platform for short) is an integrated toolset that has been developed in the context of the AVANTSSAR project (www avantssar eu, ) for the formal specification and automated validation of trust and security of service-oriented architectures (SOAs) This technology, which involves the design of a suitable specification language and is based on a variety of complementary techniques8, has been tuned and proven on a number of relevant industrial case studies We also report on our activities in migrating project results to industry and disseminating them to standardization bodies, which will ultimately speed up the development of new network and service infrastructures, enhance their security and robusiness, and thus increase the development and public acceptance of the loS We proceed as follows in Sections 2 and 3, we discuss, respectively, some of the main features of specification languages and automated validation techniques that have been developed for the verification of trust and security of Services in Section 4, we present the AVANTSSAR Platform and the AVANTSSAR Library, and then, in Section 5, we present some case studies and validation success stories, and the migration of results into industrial practice Section 6 concludes the chapter 2 Specification Languages Modeling and reasoning about trust and security of SOAs is complex due to three main characteristics of service orientation First, SOAs are heterogeneous: their componente are built using different technology and run in different environments, yet interact and may interfere with each other 8 Such as model checking with constraints, approaches based on SAT (i e , satisfiabil-ity) or SMT (i e , satisfiability modulo testing), or abstract interpretation Towards Formal Validation of Trust and Security in the internet of Services 195 Second, SOAs are also distributed systems, with functionality and resources distributed over several machines or processes The resulting exponential state-space complexity makes their design and efficient validation difficult, even more so in hostile situations perhaps unforeseen at design time Third, SOAs and their security requirements are continuously evolving: Services may be composed at runtime, agents may join or leave, and client creden-tials are affected by dynamic changes in security policies (e g , for incidente or emergencies) Hence, security policies must be regarded as part of the service specification and as first-class objects exchanged and processed by Services The security properties of SOAs are, moreover, very diverse The classical data security requirements include confidentiality and authentication integrity of the communicated data More elaborate goals are structural properties (which can sometimes be reduced to confidentiality and authentication goals) such as authorization (with respect to a policy), separation or binding of duty, and accountability or non-repudiation Some applications may also have domain-specific goals (e g , correct processing of orders) Finally, one may consider live-ness properties (under certain fairness conditions), e g , for a given web service for Online shopping one may require that every order will eventually be processed if the intruder cannot block the communication indefinitely This diversity of goals cannot be formulated with a fixed repertoire of generic properties (like authentication); instead, it suggests the need for specification of properties in an expressive logic Various languages have been proposed to model trust and security of SOAs, e g , BPEL , 7г calculus , i'’   , to naine a few Each of them, however, focuses only on some aspects of SOAs, and cannot cover all previously described features, except perhaps in an artificial way One needs a language fully dedi-cated to specifying trust and security aspects of Services, their composition, the properties that they should satisfy and the policies they manipulate and abide by Moreover, the language must go beyond static service structure: a key chal-lenge is to integrate policies that are dynamic (e g , changing with the workflow context) with Services that can be added and composed dynamically themselves As a concrete solution, in the AVANTSSAR project, we have defined a language, the AVANTSSAR Specification Language ASLan, that is both expressive enough that many high-level languages, such as BPEL, can be translated to it, and amenable to formal analysis 9 A key feature of ASLan is the integration of Horn clauses that are used to describe policies in a clear, logical way, with a transition system that expresses the dynamics of the system, e g , agents can become members of a group or leave it, with immediate consequences for their access rights 9 The AVANTSSAR Platform allows users also to input their Services by specifying them using the high-level formal specification language ASLan++, which we have defined to be close to specification languages for security protocols services and to procedural and object-oriented programming languages The semantics of ASLan++ is formally defined by translation to ASLan 196 R Carbone et al As a simple, general (i e , not AVANTSSAR ASLan specific) example, consider the policies that a user U has access to a file F if U belongs to a group G that is the owner of F, or U is the deputy of a user that has access to F: access(U, F) rnember(U, G) A owner(G, F) access(U, F) deputy(U, U') Л access(U', F) Policies are dynamic, since facts like member, owner, and deputy can change over time, which in turn affects access rights For instance, if user Alice changes to another group within the organization, she will immediately obtain all access rights to files of the new group, but lose access rights to files of her old group, except for those that she maintains due to her being a deputy for other users We consider transition systems in which a state is a set of facts like member, owner, etc ; they can be used to describe service workflows and steps in security protocols For instance, an employee (Alice) changing group membership at the command of her manager (Peter) can be formalized as: member(Alice, g^) A isManager(Peter, Alice) A canAssign(Peter, дз) =s  member(Alice, дз) A isManager(Peter, Alice) A canAssign(Peter, дз) The above transition is applicable in a state that includes all facts on the left hand side When the transition is applied, Alice A membership to g± is removed, she is added as member to state A(A,iD,2,B,Kab,H,Na,Nb) iknows(pair(Na,apply(H, pair(Kab,pair(Na,pair(Nb,A)))))) Fig 1 MS-CHAP v2 protocol and ASLan transition rule Let F be the set of ground facts; the set of all possible states is then S = 2F An ASLan model defines a transition system M = (S, i, , where i C S is the set of initial states and ! C S x S is the transition relation in an ASLan model, the set of initial states is a conjunction of facts Tran-sitions are rewrite rules where both sides are conjunctions of facts A transition can be taken from any state that contains the facts on the left-hand side; these are removed from the state and replaced by the facts on the right-hand side As an exception, iknows (intruder knowledge) is a persistent fact and does not disappear, even if written on the left-hand side and being omitted on the right Formally, we first define the closure |"S"|H of a state S with respect to the set H of Horn clauses in the model as the set of all ground facts that can be derived from S using H More precisely, (S  H is the smallest set containing S such that VF F1; ••• ,Fn 2 H; Va F R, where PF is a set of positive facts, NF is a set of negative (negated) facts, V is a set of fresh introduced variables, and the right-hand side R is a conjunction of facts We can now define the transition relation ! as follows: there is a transition S ! S' iff there exists a transition rule PF NF =[V]=> R and a substitution a from the variables of PF to ground terms such that following conditions hold: — PFa C (S  H, i e , the positive facts on the left-hand side hold in (S  H — NFaa'   |"S"|H = 0 for all substitutions a' such that NFaa' is ground, i e , the negative facts cannot hold in (S  H — S' = (S  PFa) U Raa'0, where a" is any substitution such that for all v 2 V, va'' does not occur in S (i e , variables in V are substituted with fresh terms) The combination of transition rules and Horn clauses in the language implies the existence of two kinds of facts Explicit facts are introduced by the right-hand side of transition rules and are persistent unless removed by a later transition (if present on the left-hand side but not the right-hand side) implicit facts are introduced by Horn clauses and are recomputed as part of the state closure after each transition step To ensure a consistent semantics, explicit facts (including the intruder knowledge iknows) cannot appear in the conclusion of a Horn clause This impacts our design of guessing rules, which must add intruder knowledge These definitions lead to an execution model for an ASLan specification that alternates Horn clause deductions and transition steps: first, the set of facts com-prising a state is augmented by the facts obtained by performing the transitive closure of the Horn clauses, and then one of the applicable transition rules is chosen and executed, after which the entire process is repeated in particular, this makes Horn clauses suitable for modelling intruder deduction and any addi-tional processing necessary for attack detection, as Horn clause deductions are performed after each transition step 3 Customized transitions for detection of DoS attacks by resource exhaustion We formalize costs and attack conditions in order to detect DoS attacks by resource exhaustion While we focus on computation resources due to the varying cost of cryptographic primitives, costs could be associated to memory consump-tion or other resources as well Resource exhaustion DoS attacks can be divided according to the behaviour of the adversary in two categories: one is abusive use of the service by clients which willingly or not deplete the server from resources, the other is malicious use in which adversaries manipulate protocol messages and make honest prin-cipals waste computational time without reaching protocol goals For the first case, we consider an attack feasible if the initiator can force repeated use of the protocol, which leads to resource depletion For the second case we consider the protocol under attack when principals reach states in which their beliefs about the protocol are wrong, e g , messages are accepted from impersonated senders Cutting down communication is not an issue since the intruder can do this for any protocol and protocol design cannot give countermeasures to it in both cases, to deem a resource exhaustion attack successful we must evaluate costs for both the adversary and honest principals An attack is flagged as successful when both the cost of the adversary is lower and one of the two situations hold: the adversary is the initiator or the principal's beliefs are wrong 3 1 Defining costs and augmenting transitions Costs can be treated according to the framework of Meadows , which uses a monoid structure; this approach is also used in follow-up related work The cost set employed is S = {0,low,medium,high}, and the sum of two costs is simply defined as their maximum: 8a, b 2 S, a + b = max(a, b) This can be easily modeled in ASLan by using a fact for summing costs, as shown in Figure 2 where cost values are of type text and sum has the signature sum: text * text * text -> fact in the same manner, the comparison between cost values is defined with the fact less The existing AVANTSSAR model checkers have limited support for numeric values Using SMT-based techniques would allow for integer costs and a better evaluation of complex attacks such as distributed DoS, where a more sensitive cost analysis must be done For example, the initial cost of the adversary can be high, but it can be alleviated over multiple protocol sessions Only a few manual analyses have been done with explicit numeric cost values ; most analyses in the literature are symbolic, using a monoid as cost structure Transitions can be easily augmented by costs This has to be done for both protocol steps (as described in detail in ) and intruder deductions Figure 2 shows the cost definition and an intruder deduction modeled as protocol transi-tion that keeps track of cost The example is a deduction in which the intruder performs a signature with key Y over term X, denoted by costSig(X,Y), and incurring cost high, with the initial condition that he knows both X and Y sum(low, low, low) sum(low, medium, medium) sum(medium, low, medium) sum(low, high, high) sum(high, low, high) sum(high, medium, high) sum(medium, high, high) sum(high, high, high) less(low, medium) less(medium, high) less(low, high) step trans 1(X, Y, Cost, NewCost, iD):= state adv(i, iD, 0) iknows(X) iknows(Y) cost(i, Cost) sum(Cost, high, NewCost) => state adv(i, iD, 0) iknows(costSig(X,Y)) cost(i, NewCost) sum(Cost, high, NewCost) Fig 2 Defining costs (left) and a cost-augmented transition for a signature (right) 3 2 Defining the attack condition To flag an attack on principal P, a necessary condition is that the intruder cost is less than the cost incurred by P: cost(i, Ci) cost(i, CP) less(Ci, CP) in addi-tion, for abusive use, we need to keep track of the protocol initiator This can be done by augmenting the initial transition of the protocol (done by principal A) with the fact initiate(A) and adding initiate(i) in the attack condition (the attack must be repeatable by the intruder) For malicious use, we track the viola-tion of injective agreement This can be done by augmenting the right-hand side of each send and receive transition with the facts send(S, R, M, L, iD) and re-spectively recv(S, R, M, L, iD), having as parameters the sender, recipient, content, protocol step and instance The attack is flagged by checking satisfi-ability of the condition recv(S, R, M, L, iD) not(send(S, R, M, L, iD), which means that a message receive does not have a matching send These attack conditions can be further refined along other criteria, such as determining whether the attack is detectable or not by a given principal, or by any honest principal, etc A more difficult issue is handling costs over multiple sessions in this case, principals must not cumulate costs from correct protocol runs, but only from sessions initiated by the adversary or from malicious sessions This requires rewriting each protocol transition in several ways, keeping track of these conditions, and tracking costs either per-session or per-principal Modelling details are given in As an example, we discuss the Station-to-Station protocol (STS) depicted in Figure 3 The protocol computes a shared session key k = axy starting from the random values x and y chosen by the two participants A - Adv(B) : ax A -  > B : ax Adv - B : ax B - -> A : ay, Cert в, Ek(sigB (ay, ax)) B — Adv: ay, Cert B ,Ek (sigB (ay ,ax)) A-  > B : CertA,Ek(зідл(ах,ау)) Adv(B) — A : ay, Cert B, Ek (sigB (ay ,ax)) A — Adv(B) : CertA,Ek(sigA(ax,ay)) Fig 3 Station to Station protocol (left) and Lowe's attack (right) Lowe's attack , in the right part of Figure 3, shows the adversary capturing the message sent by A to B and resending it in his own name to B Afterwards, B is talking to Adv, while A believes she is talking to B Adv(B) means the adversary impersonating B , while Adv is the adversary acting as himself The attack found by Lowe shows a flaw in the protocol, irrespective of costs Later, Meadows analyzed this attack from a cost-based perspective Our model allows a model checker to detect this attack automatically By using an attack condition (attack state) such as dos on a(X, Y, P, V, L, iD) := cost(a, X) cost(i, Y) less(Y, X) recv(P, a, V, L, iD) & not(send(P, a, V, L, iD)) we direct the model checker to find a protocol trace in which the adversary has lower cost than the honest principal A, who accepts a message from a different session Figure 4 presents the attack trace found by CL-Atse (release 2 5-8) The attack differs slightly from the one found by Lowe, but by placing different constraints the back-end can reproduce Lowe's attack as well The trace shows the adversary reusing a value sent by A to obtain a response from B that is further redirected and accepted by A Steps 1 to 3 are from A's session with B (compromised by the intruder in step 3), while step 2' is from a session between the intruder and B (step 1' is implicit since the intruder knows everything sent over the network) The cost of both A and B is high as they compute modular exponentiations while their beliefs about the resulting shared session key (axy) are both wrong A believes she shares a key with B , while B believes he shares the key with Adv, who is actually unable to compute it without knowing x The cost of the adversary is low as he doesn't perform computations except for redirecting messages, which is assumed to be cheap 2 B ! i (A) : oyi, Cert в, Ek (sigB (oyi , ox)) 20 B ! i : oy2, Cert в, Ek (sigB (oy2 ,ox')') 3 i (B) ! A : oy2, Cert в, Ek (sigB (oy2 ,ox)) state process(A, iD, 0) ihears(AtomText) contains(AtomText, snull) replace(AtomText, AtomText) state process(A, iD, 0) ihears(pair(T1, T2)) contains(T1, S1) contains(T2, S2) replace(T1, T1New) replace(T2, T2New) => state process(A, iD, 0) ihears(pair(T1, T2)) contains(T1, S1) contains(T2, S2) replace(T1, T1New) replace(T2, T2New) contains(pair(T1, T2), S1) contains(pair(T1, T2), S2) replace(pair(T1, T2), pair(T1New, T2New)) Fig 5 Contains and replace for atomic terms (left) and composed terms (right) improved Approach with Transitions The previous approach is inefficient because the steps for customized intruder deductions can be interleaved with protocol steps, leading to exponential com-plexity To avoid this, we control the order in which the terms are processed by placing them in a stack (constructed with pair and a dummy separator) Terms are processed by structural decomposition and each new sub-term is placed on top of the stack unless it is an atom and the contains and replace facts for it can be directly deduced Clearly an atom contains the secret if and only if it is the actual secret, otherwise contains is false and replace leaves the atom unchanged For example, consider the term scrypt(k h(pair(Na Nb))) heard over the network in the first step the stack contains only this term Next, k (the left operand of scrypt) is added to the top of the stack As this operand is atomic, one can directly establish contains and replace for it, and remove it from the stack The next item on the stack will be the right operand of scrypt, i e , h(pair(NA, Nb)) The next element is pair(NA Nb), with the stack now con-taining three items, and so on On the left side of Figure 6 an atom of type text is eliminated from the list, while on the right side a composed term is split into its components This mechanism greatly reduces complexity due to interleaving As can be seen in Table 1, this modelling variant succeeds in deriving the guess also for the artificially complicated structure of MS-CHAP** However, its time requirement increases significantly with the complexity of the term, a drawback removed in the next modelling solution state process(A, iD, 1) process( pair(pair(AtomText,sep),Right)) not(guessable(AtomText)) => state process(A, iD, 0) process(Right) contains(AtomText, snull) checked(AtomText) replace(AtomText, AtomText) replaced(AtomText) state process(A, iD, 1) process( pair(pair(pair(T1, T2), sep), Right)) contains(T1, S1) replace(T1, T1New) contains(T2, S2) replace(T2, T2New) => state process(A, iD, 0) process(Right) contains(T1, S1) replace(T1, T1New) contains(T2, S2) replace(T2, T2New) checked(pair(T1, T2)) contains(pair(T1, T2), pair(S1, S2)) replace(pair(T1,T2),pair(T1New,T2New)) replaced(pair(T1, T2)) Fig 6 improved contains replace for atomic terms (left) and composed terms (right) Efflcient approach with Horn clauses Horn clauses are more elegant and intuitive for modelling intruder deduc-tions They were specially introduced in ASLan for this purpose, as well as for modelling static or dynamic security policies The model checkers of the AVANTSSAR platform process Horn clauses in different ways, depending on their overall exploration strategy CL-Atse employs a backward search, using Horn clauses only when deriving some fact is required (e g , for the left-hand side of a transition) SATMC on the other hand employs a forward strategy and saturates the set of known facts by transitively applying Horn clauses after each transitions Thus, Horn clauses written with one search strategy in mind may lead a model checker employing the opposite strategy to non-termination We have devised models adapted to the use of CL-Atse For example, the Horn clauses in Figure 7 find both the part of a term and its remainder The fact ispart(T1,T2,T3) denotes that T1 is split into disjoint parts T2 and T3 The Horn clause partdeft states that T2 is part of pair(To,T1) with remainder pair(T3,T3) if T2 is part of T1 with remainder T3 Such rules need to be written for all operators that can be applied on terms hc part null(T1) := ispart(T1, null, T1) hc part id(T1) : = ispart(T1, T1, null) hc part left(T0, T1, T2, T3) := ispart(pair(T0, T1), T2, pair(T0, T3)) :- ispart(T1, T2, T3) hc part right(T0, T1, T2, T3) := ispart(pair(T0, T1), T2, pair(T3, T1)) :- ispart(T0, T2, T3) hc part scrypt left(T0, T1, T2, T3) := ispart(scrypt(T0, T1), T2, pair(T0, T3)) :- ispart(T1, T2, T3) hc part scrypt right(T0, T1, T2, T3) := ispart(scrypt(T0, T1), T2, pair(T3, T1)) :- ispart(T0, T2, T3) Fig 7 Splitting terms using Horn clauses 1 A ! B: A 2 B ! A: Nb 3 A ! B: Na, H(kAB, Na , Nb , A) i ! (chap init, 11) : start (chap init, 11) ! i : a i ! (chap Resp, 18) : a (chap Resp, 18) ! i : n4(Nb) i ! (chap init, 13) : n4(Nb) (chap init, 13) ! i : pair(n2(Na),h(pair(s, pair(n2(Na), pair(Nb(2), a))))) i ! (chap Resp, 20) : pair(n2(Na),h(pair(s, 4 B ! A: H(клв , Na) pair(n2(Na), pair(n4(Nb), a))))) :(chap Resp, 20) ! i : h(pair(s, n2(Na))) controls(h(pair(s,n2(Na))),s), iguess(s), ihears(h(pair(s,n2(Na)))), Horn clause facts: ispart(h(pair(s,n2(Na))),h(pair(s,n2(Na))),null), ispart(s,pair(s,n2(Na)),pair(n2(Na),null)), ispart(s,s,null), observes(h(pair(s,n2(Na))),s) Fig 8 MS-CHAP v2 and the corresponding attack trace found by CL-Atse in the attack trace from Figure 8 the intruder was forced to guess k   ; from the message in step 4, although it could have also guessed at step 3 The Horn clauses show that the intruder observes the term from step 4, and repeatedly applies rules involving ispart until it can derives controls, which then allows the guess Rules for observes and controls are discussed in the next subsection Table 1 shows that by using this approach the increase in time requirements is negligible for more complex terms which otherwise require several seconds of processing, or even fail if naive transition rules are used for term processing MS-CHAP MS-CHAP* MS-CHAP** Naive Transitions 456 ms 820 ms TOUT Efflcient Transitions 1272 ms 1812 ms 10529 ms Horn Clauses 120 ms 120 ms 112 ms Table 1 Timing results for attack detection on MS-CHAP with CL-Atse 4 3 Using Horn clauses and transitions for intruder deductions Finally, to flag a guessing attack, we need to determine whether some term is verifiable by the intruder Figure 9 shows rules for the verifiability conditions discussed previously To achieve this, in some cases we need to add terms to the intruder knowledge as shown in Figure 10 This is important for modelling: while Horn clauses can be used for verifying terms, they cannot be used to add terms to intruder knowledge when working with CL-Atse, due to the backward strategy it employs when using Horn clauses (SATMC however can do this, as it due to the forward strategy employed) The two guessing cases are detected by the Horn clauses in Figure 11 Thus, to validate the guess we have to use a mixture of Horn clauses and intruder transitions Using these, CL-Atse is able to detect guessing from terms such as {m, m}s or k, {{m, m}k}s, etc 4 4 Distinguishing detectable from undetectable on-line attacks With the guessing mechanism established above, the attack condition can be stated in different flavours For example, as the deduction rules allow detecting on-line attacks, we can ask whether the attack is detectable or not by some (or any) honest participant The relevance of this kind of undetectable on-line attacks was previously pointed out by Ding and Horster We can express that guessing is undetectable for honest participants if for all executions where guessing happens, the protocol is completed normally by all participants Thus, we can reformulate undetectable guessing as a reachability check for an attack state in which the secret has been guessed and all protocol participants have completed execution in ASLan models, each participant has a unique identifier iD which is part of its state fact We also define for each participant the fact running(iD) which is true in every state except the participant's initial and final states An adversary ° " verify known term hc verif iknows(MsgA) := verifiable(MsgA) :- iknows(MsgA) % verify signature hc verif sign(PbK, MsgA) := verifiable(apply(inv(PbK), MsgA)) :- iknows(PbK), iknows(MsgA) % verification of term under hash hc verif hash(MsgA, MsgB, MsgC) := verifiable(MsgA) :- iknows(apply(h,MsgB)), ispart(apply(h,MsgB), MsgA, MsgC), iknows(MsgC) % the ciphertext is verifiable if the encryption key is known % and part of the plaintext is verifiable hc verif scrypt ciphertext(K, MsgA, MsgB, MsgC) := verifiable(scrypt(K, MsgA)) :- iknows(K), split(MsgA, MsgB, MsgC), verifiable(MsgC) Fig 9 Horn clauses for verifying terms % split a message if it was not split before step trans split(A, MsgA, MsgB, MsgC, K):= state split(A) ihears(scrypt(K, MsgB)) ispart(MsgB, MsgA, MsgC) not(equal(MsgC, null)) not(is split(MsgB)) => state split(A) ihears(scrypt(K, MsgB)) ispart(MsgB, MsgA, MsgC) iknows(MsgA) split(MsgB, MsgA, MsgC) is split(MsgB) Fig 10 Transition for adding terms to intruder knowledge observes (controls) an oracle undetectably if it observes (controls) the oracle and all protocol participants reach a final state, i e , no fact running(iD) holds A protocol description can be automatically augmented to allow for this check by statically identifying its initial and final transitions initial transitions have in the LHS a state fact, whereas final transitions have in the RHS a state fact that does not appear in the LHS of any other transition rule Every initial transition is augmented to generate a fresh iD value, and the positive fact running(iD) is added to its RHS Every final transition is augmented with the fact running(iD) on the LHS, but not on the RHS, thus it becomes false This protocol adaptation allows to directly express undetectable guessing The same technique can be used to distinguish offline attacks This is achieved by checking for attacks, while requiring that no fresh iD is ever generated it may also be useful to check, for instance, if only adversary observe actions were done on-line, while Controls, which involves computations and is more tedious, hc controls hash(S, K, Rest, Msg) := controls(apply(h, Msg), S) :- ihears(apply(h, Msg)), ispart(S, Msg, Rest), iknows(Rest) hc observes hash(S, K, Rest, Msg) := observes(apply(h, Msg), S) :- ihears(apply(h, Msg)), ispart(S, Msg, Rest) hc guess case i(S, Msg) := iguess(S) :- lowentropy(S), observes(Msg, S), controls(Msg, S) hc controls scrypt(S, K, KRest, Msg) := controls(scrypt(K, Msg), S) :- ihears(scrypt(K, Msg)), ispart(S, K, KRest), iknows(KRest) hc observes scrypt(S, K, KRest, Msg) := observes(scrypt(K, Msg), S) :- ihears(scrypt(K, Msg)), ispart(S, K, KRest) hc guess case ii(S, K, MsgA, MsgB, MsgC) := iguess(S) :- lowentropy(S), observes(scrypt(K, MsgA), S), controls(scrypt(K, MsgA), S), split(MsgA, MsgB, MsgC), verifiable(MsgC) Fig 11 Horn clauses for guessing is performed offline This can be done by checking that no fresh iD is generated between observes and controls Thus, our approach allows not only the detection of guessing attacks, but also their classification 5 Conclusions As model checkers for security protocols do not by default support the detection of all attacks, one needs to use customized intruder deductions and transitions for this purpose This allows the handling of new types of attacks without changing the model-checking back-ends in this paper, we have explored two such case studies: modelling guessing attacks and denial of service by resource exhaustion These attacks are relevant as many protocols used in practice are vulnerable to them, and we show the applicability of our theories with automatically obtained attack traces on known protocols We present different modelling options and investigate the relative efficiency of transition rules and Horn clauses, with the latter providing significant per-formance gain and allowing the processing of more complex message terms The modelling approaches described here show the power of the ASLan specification language which serves as input to the AVANTSSAR model checkers We hope that the approaches shown here can provide a starting point for modelling other types of attacks that are currently not detected by default References 1 Abadi, M , Baudet, M , Warinschi, B : Guessing attacks and the computational soundness of static equivalence in: Proc 9th int'l Conf on Foundations of Software Science and Computation Structures pp 398-412 LNCS vol 3921, Springer (2006) 2 Armando, A , Compagna, L : SAT-based model-checking for security protocols analysis international Journal of information Security 7(1), 3-32 (2008) 3 AVANTSSAR: Deliverable 2 3 (update): ASLan++ specification and tutorial (2011), http:  www avantssar eu 4 Basin, D A , Modersheim, S , Vigano, L : OFMC: A symbolic model checker for security protocols internat J of information Security 4(3), 181-208 (2005) 5 Blanchet, B : An efflcient cryptographic protocol verifier based on Prolog rules in: Proc 14th iEEE Computer Security Foundations Workshop pp 82-96 (2001) 6 Corin, R , Doumen, J M , Etalle, S : Analysing password protocol security against off-line dictionary attacks in: Proc 2nd int'l Workshop on Security issues with Petri Nets and other Computational Models (WiSP) pp 47-63 (2004) 7 Corin, R , Malladi, S , Alves-Foss, J , Etalle, S : Guess what? Here is a new tool that finds some new guessing attacks in: Proc Workshop on issues in the Theory of Security pp 62-71 (2003) 8 Diffle, W , van Oorschot, P C , Wiener, M J : Authentication and authenticated key exchanges Designs, Codes and Cryptography 2(2), 107-125 (1992) 9 Ding, Y , Horster, P : Undetectable on-line password guessing attacks Operating Systems Review 29(4), 77-86 (1995) 10 Groza, B , Minea, M : A formal approach for automated reasoning about off-line and undetectable on-line guessing in: Proc 14th int'l Conf on Financial Cryp-tography and Data Security pp 391-399 LNCS vol 6052, Springer (2010) 11 Groza, B , Minea, M : Formal modelling and automatic detection of resource ex-haustion attacks in: Proceedings of the 6th ACM Symposium on information, Computer and Communications Security (ASiACCS) (2011) 12 Hankes Drielsma, P , Modersheim, S , Vigano, L : A formalization of off-line guessing for security protocol analysis in: Proc 11th int'l Conf on Logic for Program-ming, Artificial intelligence, and Reasoning pp 363-379 LNCS vol 3452, Springer (2005) 13 Lowe, G : Some new attacks upon security protocols in: Proc of the 9th iEEE Computer Security Foundations Workshop pp 162-169 (1996) 14 Lowe, G : Analysing protocols subject to guessing attacks Journal of Computer Security 12(1), 83-98 (2004) 15 Matsuura, K , imai, H : Modification of internet key exchange resistant against denial-of-service in: Pre-Proceedings of internet Workshop pp 167-174 (2000) 16 Meadows, C : A cost-based framework for analysis of denial of service networks Journal of Computer Security 9(1 2), 143-164 (2001) 17 Ramachandran, V : Analyzing DoS-resistance of protocols using a cost-based framework Tech Rep DCS TR-1239, Yale University (2002) 18 Smith, J , Gonzalez Nieto, J M , Boyd, C : Modelling denial of service attacks on JFK with Meadows's cost-based framework in: Proc of the 4th Australasian information Security Workshop pp 125-134 (2006) 19 Turuani, M : The CL-Atse protocol analyser in: Proc of the 17th int'l Conference on Term Rewriting and Applications pp 277-286 LNCS vol 4098, Springer (2006) 20 Zorn, G : Microsoft PPP CHAP extensions, version 2 (2000) w Formal Methods in System Design, 21, 251-280, 2002 © 2002 Kluwer Academic Publishers Manufactured in The Netherlands Combining Software and Hardware Verification Techniques ROBERT P KURSHAN k@research bell-labs com VLADiMiR LEViN levin@research bell-labs com Lucent Technologies, Bell Laboratories, Murray Hill, NJ 07974, USA MARiUS MiNEA marius+@cs cmu edu Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA DORON PELED doron@research bell-labs com HUSNU YENiGUN husnu@research bell-labs com Lucent Technologies, Bell Laboratories, Murray Hill, NJ 07974, USA Received June 6, 2000; Accepted December 3, 2001 Abstract Combining verification methods developed separately for software and hardware is motivated by the industry's need for a technology that would make formal verification of realistic software hardware co-designs practical We focus on techniques that have proved successful in each of the two domains: BDD-based symbolic model checking for hardware verification and partial order reduction for the verification of concurrent software programs in this paper, we first suggest a modification of partial order reduction, allowing its combination with any BDD-based verification tool, and then describe a co-verification methodology developed using these techniques jointly Our experimental results demonstrate the efficiency of this combined verification technique, and suggest that for moderate-size systems the method is ready for industrial application Keywords: formal verification, model checking, hardware software co-design, partial order reduction 1 introduction Software and hardware verification, although having a lot in common, have developed along different paths Even in the specific context of model checking, in which the system is represented as a graph or an automaton, several differences become apparent Software systems typically use an asynchronous model of execution, in which concurrent actions of component modules are interleaved in verification, the asynchrony is exploited using partial order reduction , which explores during verification only a subset of the available actions from each state The remaining actions are delayed to a subsequent step, as long as this does not result in any change visible to the specification being checked On the other hand, hardware is typically designed for synchronous execution All component modules perform an action at each execution step Hardware verification usually exploits the regularity of digital circuits, often built from many identical units, by representing the state space using binary decision diagrams (BDDs) Another technique which 252 KURSHAN ET AL makes hardware verification manageable is localization reduction which abstracts away the hardware design parts which are irrelevant to the verified property Thus, traditionally, formal veriflcation of hardware and software is done through differ-ent techniques, using tools which are based on different algorithms, representations and principles However, there are important and growing classes of mixed (combined) hardware-software systems, co-designs, in which hardware and software are tightly coupled The tight coupling precludes testing the hardware and software separately On the other hand, there may be 100 hardware steps for one software step The difference renders conven-tional simulation test exceedingly inefficient, and results in co-design systems that cannot be effectively tested by conventional means New testing methods and commercial tools to support them have emerged to address this problem For example, we refer the reader to websites of major EDA vendors,1 where co-verification (as a matter of fact, co-simulation) tools are more and more heavely promoted: there are too many of them to mention here They generally involve ad hoc abstraction of the hardware (such as removing clock depen-dencies), in order to speed up the simulation of the hardware relative to the software These methods may result both in missed design errors, and false error indications that reflect errors in the abstraction, not the design Nonetheless, the design community is forced into these new methods as the only available alternative Yet, there is another alternative, based on model checking Formal verification in this context has all the well-known advantages over simulation test: better coverage, and it may be applied sooner in the design cycle it also is able to deal in a sound fashion with the interface between hardware and software, in particular, with different speed rates on the two sides This motivates our efforts to introduce formal verification into the area of co-design systems For this, an efficient verification technique is needed that is able to address co-design systems containing both kinds of components, hardware and software in this paper, we attempt to combine the benefits of both methodologies: we suggest a verification technique that combines partial order reduction with a BDD representation and, in general, hardware verification techniques The partial order reduction principle of selecting a subset of the enabled actions from each state poses no problem when combining it with BDDs or localization reduction it only means that the transition relation needs to be restricted so that it takes advantage of the potential commutativity between concurrent actions The idea that this can be done statically, at compile time, was suggested by , but their implementation required some changes in the depth-first search algorithm (in order to control the backtracking mechanism) of the Spin model checker The subtle point that has so far made explicit state enumeration seem more appropriate than BDD-based symbolic exploration for implementing the partial order reduction algo-rithm is the cycle closing problem Since partial order reduction may defer an action in favor of another one that can be executed concurrently, one needs to ensure that no action is ignored indefinitely along a cycle in the state space One solution to this problem was proposed by and elaborated by in the first attempt at combining partial order reduc-tion with BDDs Their solution is based on a conservative approximation of when a cycle may be closed during a breadth-first search Essentially, when an edge connects a node to COMBiNiNG SOFTWARE AND HARDWARE VERiFiCATiON TECHNiQUES 253 another node that is at the same or a lower level in the breadth-first search, it is assumed (conservatively) to close a cycle We propose an alternative solution that computes at compile time the conditions which guarantee that no action is ignored The method is based on the observation that any cycle in the global state space projects to a local cycle in each participating process These local cycles can be detected at compile time An action from each cycle is selected, such that at run time, the execution of each selected action forces a complete exploration of all actions that have been deferred so far in favor of other actions The number of these special actions (the more of which there are, the less the achieved reduction) can be minimized by analyzing the effects of transitions Our implementation of the algorithm has the unique feature that all the information needed for performing the partial order reduction is obtained during a compilation of the software system model There is no change at all in the verification tool, in this case the model checker COSPAN Thus, with the new algorithm, partial order reduction is implemented as a compilation or preprocessing phase for model checking, rather than as a modified model checking algorithm it is precisely this feature that allows a combina-tion of the partial order reduction with BDD-based algorithms and, in general, with any optimization technique applied by the model checker Can we gain by combining software- and hardware-oriented verification techniques, partial order reduction and BDDs? We answer this question affirmatively: the combination of partial order reduction and hardware-oriented verification techniques makes possible hardware software co-verification, i e , the integral verification of a hardware software co-design in particular, there are examples in this area for which the use of a single method (be it BDDs or partial order reduction) terminates in lack of memory due to state space explosion, whereas the combination of the two methods makes verification possible (see Section 5) The remainder of the paper is organized as follows The next section explains the modifi-cation of partial order reduction aimed at its combination with any existing model checker, in particular, with one based on BDDs Section 3 presents a co-verification methodology that makes use of the combination of partial order reduction with hardware-oriented verification techniques Section 4 describes our current implementation of this co-verification tech-nique with an emphasis on modifying the transition relation according to the partial order reduction constraints Section 5 presents experimental results and Section 6 the conclusion 2 Partial order reduction in Sections 2 1 and 2 2, we present the basics of the temporal logic LTL and of the partial order reduction technique Sections 2 3 and 2 4 describe the modification to partial order reduction required to fit our needs 2 1 Preliminaries The system to be analyzed is viewed as a state graph if S is the set of states, a transition is a relation a c S x S A state graph is defined as a tuple M = (S, S0, T, L), where So c S 254 KURSHAN ET AL is the set of initial states and T is the set of transitions The labeling function L : S 2AP associates each state of M with a set of atomic propositions that hold at s A transition a e T is enabled at state s e S if there exists a state s' e S such that (s, s') e a Otherwise a is said to be disabled at s For a state s, enabled(s) is the set of all transitions a such that a is enabled at s A transition a is called deterministic if for any state s e S in which a is enabled there is a unique s' such that (s, s') e a in this case a can be viewed as a partial function on S, and the notation s' = a(s) can beused instead of (s, s') e a in this paper, we restrict ourselves to state graphs with only deterministic transitions Yet, nondeterminism may appear as a nondeterministic selection among several enabled transitions in order to simplify the picture, we avoid states from which no transition is possible, and therefore, for such (i e deadlocked) states, force T to have the self-looping transition S = {(s, s) | s has no successors except s} An execution sequence a of a state graph M is an infinite alternating sequence of states si and transitions ai : a = s0 si • • • such that si+1 = ai (si) for all i if s0 e So then a is referred to as a full execution sequence We denote by ai the suffix of a that starts from -i ai ai+1 ai+2 its ith element-state, i e , ai = si si+1 si +2 • For assertions about the behavior of a program, we use the temporal logic LTL Given a set AP of atomic propositions, LTL formulas are defined as follows: - for all p e AP, p is a formula - if ф and p are formulas, then so are —ф, ф л p, and ф Up An execution sequence a = s0 s1 • • • is said to satisfy an LTL formula ф (denoted by a |= ф) under the following conditions: - if ф = p for some p e AP, and p e L(s0) - if ф = —p, and not a = p - if ф = p л ф, and (a = p) л (a = ф) - if ф = Op, and a1 = p - if ф = p Uф, and there exists an i > 0 such that for all 0 • • • and p = r0 — r1 — , are called stuttering equivalent (denoted by a  st p) if there exist two infinite sequences of indices 0 = i0 0, L(sik) = L(sik+i) = • • • = L= L(j) = L(j+i) = •    = L(jn-O intuitively, two execution sequences are stuttering equivalent if they have identical state labelings after in each of them, any finite sequence of identically labeled states is collapsed into a single state Two state graphs M and M' are said to be stuttering equivalent, denoted by M  st M', if for each full execution sequence a of M, there exists a full execution sequence p of M' such that a  st p, and vice versa (for each full execution sequence p of M', there exists a full execution sequence a of M such that p  st a) The importance of stuttering equivalence is the following We call an LTL formula ф stuttering invariant if for any two execution sequences a and p such that a  st p, it holds that a = ф iff p = ф This definition together with the definition of stuttering equivalence between state graphs easily implies that if ф is stuttering invariant and M  st Mthen M = ф iff M' = ф This is the basic idea behind partial order reduction: generate a model M' with a smaller number of states than M and use M' to model check a stuttering invariant property ф Lamport showed that any LTL property that does not use the next-time operator is stuttering invariant Conversely, in , it is shown that any stuttering invariant LTL formula can be written without the use of the next-time operator O- From now on, we restrict ourselves only to stuttering invariant LTL formulas We conclude this section by introducing two basic concepts used in partial order reduction A transition a is said to be visible (with respect to ф) if there exist two states s and s' such that s' = a(s) and L(s) = L(s') The other key concept is the independence relation between the transitions Two transi-tions a, в e T are said to be independent if for all states s e S, if a, в e enabled(s), then: (i) a e enabled(e(s)); and (ii) в e enabled(a(s)); and (iii) а(в(s)) = в(a(s)) intuitively, if both transitions are enabled at a state, then the execution of one of them must not disable the other (i and ii), and executing these transitions in either order must lead to the same state (iii) if two transitions are not independent, then they are called dependent transitions 2 2 Basic partial order reduction As explained in Section 2 1, the purpose of partial order reduction is to generate a reduced state graph M' with a smaller number of states than the original state graph M and with the property that M'  st M, and then perform the model checking of a stuttering invariant LTL formula ф on M' rather than on M No matter what search technique is used (depth-first, breadth-first, explicit or symbolic), with a traditional model checker one has to generate the successors of a state s for the enabled transitions a e enabled(s) However, a partial order search technique attempts to explore the successors of a state only for a subset of the enabled transitions of s Let’s call such a set of transitions ample(s) c enabled(s) Following , we define below this 256 KURSHAN ET AL subset of transitions by the conditions C0 through C3 that it must satisfy Exploring only the ample transitions results in the reduced state graph M' C0 (Emptiness) ample(s) = 0 iff enabled(s) = 0 Since we are trying to generate for each execution of M a corresponding, stuttering equivalent execution sequence in M', we must explore at least one successor of s in Mif there are any successors of s in M in our case, we have assumed for convenience that state s has at least one transition enabled in M, cf Section 2 1 Therefore, C0 implies that ample(s) = 0 C1 (Faithful decomposition) Along every execution sequence of transitions in M that starts at s, a transition that is dependent on any transition in ample(s) cannot be executed without a transition from ample(s) occurring first This constraint is introduced to ensure that any execution sequence of the full state graph M may be represented by a stuttering equivalent execution sequence in the reduced graph M' For this purpose, the transitions of the original execution sequence may have to be re-ordered Condition C1 ensures that all transitions before the first ample transition a in the original execution sequence are independent of a and, hence, can be commuted with a in order to implement condition C1, we can use further information about the semantics of the modeled system For example, given a collection of concurrent processes, with the program counter of one process allowing the execution of only one local transition, choos-ing this transition as a singleton ample set will not violate condition C1 On the other hand, consider the case where the program counter of the process is at a point where there is a selection between two input messages, of which one is enabled at the current state and the other is not (until another process progresses to send such a message) in this case, selecting the enabled input transition a as a singleton ample set may violate C1, since some transitions independent of a may execute in the other process enabling the alternative input transition, which is interdependent with a implementing condition C1 typically involves identifying such cases, see, for example, Each such case needs to be checked against the definition of condition C1 C2 (Visibility) if there exists a visible transition a e ample(s), then ample(s) = enabled(s) in other words, if ample(s) c enabled(s) then no transition in ample(s) is visible in practice, one tries to avoid including visible transitions in the ample set at state s, since otherwise the entire set of enabled transitions enabled(s) has to be explored indeed, let two independent transitions, a and в, be enabled at state si in graph M and let M allow these two executions: a0 a1 a 1 в a = sq s1 si si+1 si +2 • • • a0 ai в 2 a P = sq si si st+1 si +2 • • • COMBiNiNG SOFTWARE AND HARDWARE VERiFiCATiON TECHNiQUES 257 Condition C1 by itself suggests that we do not necessarily need both executions a and p in the reduced graph M' That is: if only C1 is applied then a may not be generated in M' assuming that p will represent it in M' (or vice versa) Now, consider the case that the propositional labeling is different at states s1+1 and s2+1 where the two executions a and p differ from each other, i e L(s^+1 ) = L (5i+1) if both transitions a and в are visible then it cannot be guaranteed that executions a and p are stuttering equivalent: they are not if, for instance, L(Si) = L5+1) and L(Si) = L(s2+1) Therefore, ample(s) = enabled(s) = {a, в} must hold in this case to force exploration of both transitions a and в However, if one of the two transitions, let, a, is invisible, then a  st p is guaranteed This is because in this case L(si) = L(s1+1) and L(s2+1) = L(si +2) Then, we don’t have to generate both execution orders in M' and may select ample(s) = {a} to factor out the execution sequence p C3 (Cycle closing) Given an execution sequence a = s0 — S1 • • • of M, if sk = S0 for some k > 0, then there exists 0 0, then there exists i, 0 3 The only transitions that can directly affect the value of this predicate are a1 and a3 Therefore, these are the only visible transitions with respect to the given specification A transition may leave the value of an atomic proposition unchanged even though it as-signs a variable referenced in it This is hard to analyze at compile time We conservatively mark as visible all local transitions that assign (data) variables used in the atomic proposi-tions of a specification An atomic proposition may also refer to specific control points, for example, be of the form "process P stays at point p1 " in such a case, it can be statically analyzed whether the execution of a local transition, for example, (p2, p3), changes the value of the atomic proposition if it does, it must be marked as a visible transition Below, we assume that static analysis conservatively calculates the set of local visible transitions Ev, such that all local images of each visible global transition belong to Ev Besides visible transitions, the set of sticky transitions T includes for each cycle in the full state space of M at least one transition that is executed on that cycle Consider figure 3 that presents a global cycle in the state graph from figure 2 it is easy to see that the local (pi,gi,0,0,0) 4 (P2,gi,0,0,0) (p3,gi,1,0,0) (рз дг, 0,0,0) % (p3,gi, 0,0,0) (рьд^О О О) Figure 3 A global cycle in the underlying state graph 262 KURSHAN ET AL images of the global transitions that appear in this global cycle form a local cycle in the control flow graph GP (and likewise for Gq) This is natural, since transition a1 moves the control point of process P from p1 to p2 in order to complete the global cycle, the control point of P has to be restored back to p1 This is only possible by executing a sequence of global transitions whose local images form a local cycle in the graph GP in general, one can observe the following: Lemma 1 ifthe global transitions {a1, ,ak] c T are executed on a global cycle, and if process P e act( t edges(G'P,)}; 3 gi := rem(GP , Г); 4 od; 5 E := Ev U Ut backedges(gi); 6 return E; Algorithm 1 ComputeStickySet Theorem 1 The set T of all global transitions that execute the local transitions in E which is returned by algorithm ComputeStickySet forms a set of sticky transitions Proof: Since every visible transition executes a (local) transition in Ev (see above) and Ev C E, T includes all visible transitions We prove next that global transitions in T also break all global cycles Consider a global cycle C, and let loc(C) be the set of local cycles that C is projected onto (cf Lemma 1) Note that execution of each global transition in C will involve executing local transitions in one or more cycles in loc(C), and every local transition у in each cycle in loc(C) will execute along C Therefore, if у is monotonic, an opposite transition must also execute along some cycle in loc(C) if some cycle in loc(C) includes a (visible) transition from Ev then C is broken, as Ev c E Now, consider a cycle C such that each cycle in loc(C) includes only transitions from the set E' = E   Ev, i e belonging to graphs G'P, , G'P in this case, we prove that C will include a global transition from T that executes some local transition in backedges(gi), for some i Contrarily, assume that C does not execute any local transition in Ui backedges(gi), and, hence, none of those local transitions belongs to any cycle in loc(C) This means by Lemma 3 that no cycle in loc(C) remains completely in g1, , gn, which may be the case only if each such cycle has been already broken by the algorithm’s action that removes monotonic transitions in line 3 266 KURSHAN ET AL Consider then a local cycle in loc(C) such that it belongs to graph G'Pt with the largest process index k, and a monotonic transition у removed from this cycle by the action in line 3 and, hence, included into set Г in line 2 Now note that for transition у to be included into set Г, all transitions opposite to у (if any) must belong to graphs G'P,, G On the other hand, as explained above in the proof, у must have an opposite transition у in (at least) one of the cycles loc(C), hence, in G'P, where j bun(x) pentru ca > bun(x) nu e o formula Nu e corect nici 3x—> (copil(x) Л bun(x)) din acelasi motiv Putem spune "exista x astfel incat e bun", dar "astfel incat" e doar un conector intre "exista x" si "e bun", care apare in vorbire dar nu in sintaxa formala, si nu are sens de implicatie in logica de ordinul i nu putem cuantifica predicate Nu putem scrie Vom(x)—> bun(x) Dupa V si 3 urmeaza o variabila Deci Vx(om(x) bun(x)) sau mai bine lizibil, Vx om(x) bun(x) Atentie la ordinea cuantificatorilor Every roommate of every CS major likes to party Subiectul e colegul de camera, dar el e definit prin aceasta calitate doar in raport cu un student la informatica Deci traducerea directa il introduce pe acesta inrai: Vx CSmaj(x) Vy rm(y, x) lp(y) Putem insa porni si de la colegul de camera, si il caracterizam prin existenta unui student la informatica cu care e coleg: Vy 3x(CSmaj(x) Л rm(y,x)) lp(y) in fine, putem introduce ambele variabile (in orice ordine) si impune apoi constrangerile intre ele: VxVy CSmaj(x) Л rm(y,x) lp(y) Putem verifica echivalenta dintre cele formulari transformandu-le in forma clauzala 1Unele exemple sunt preluate de la http:  www cs utexas edu users novak reso html Logica si structuri discrete Note de curs 1 Marius Minea Logica predicatelor Metoda rezolutiei 29 februarie 2016 (2) VxVy(carte(y) Л citit(x,y) inv(x)) = VxVy(—(carte(y) Л citit(x, y)) V inv(x)) = VxVy(—carte (y) V — citit (x, y) V inv (x)) Scrieti formula pas cu pas, pentru a evita greseli Oricine a citit toate cartile e un invatat Fraza ne spune: oricine indeplineste o anume conditie e invatat Deci formula va avea structura: Vx conditie pt x inv(x) Apoi scriem conditia: "x a citit toate cartile": Vy carte(y) citit(x, y) Deci, Vx (Vy carte(y) citit(x,y)) inv(x) sau Vx(Vy(carte(y) citit(x,y)) inv(x)) Atentie, scriind direct VxVy carte(y) citit(x, y) inv(x) sau echivalent, VxVy(carte(y) Л citit(x, y) inv(x)) obtinem alt inteles! Sa vedem ce inseamna aceasta ultima formula: oricum am alege x si y astfel incat y sa fie carte si x sa fi citit y, rezulta ca x e invatat Deci e suficient sa alegem o carte citita de x, nu e nevoie sa fi citit toate cartile! Putem vedea ca formulele sunt diferite si transformandu-le (1) Vx(Vy(carte(y) citit(x,y)) inv(x)) = Vx(—Vy(carte(y) citit(x,y)) V inv(x)) = Vx(3y—(carte(y) citit(x,y)) V inv(x)) = Vx(3y(carte(y) Л —citit(x, y)) V inv(x)) Vedem ca dupa ce am transformat implicatia si am dus negatia pana la predicate, y in formula (1) e cuantificat existential, iar in (2), universal Skolemizand (1), ajungem la forma clauzala (carte(f (x)) V inv(x)) Л (—citit(x, f (x)) V inv(x)) pe cand (2) ne da — carte(y) V —citit(x,y) V inv(x) Luand (2) adevarata, pentru a avea inv(x) e suficient sa avem un y astfel incat carte(y) si citit(x, y): primii doi literali din clauza fiind falsi, trebuie inv(x) pentru a face clauza adevarata Deci e suficienta o carte citita pentru ca x sa fie invatat - nu acesta e intelesul afirmatiei initiale Luand insa (1), ca sa avem inv(x) e suficient sa nu existe cart (atunci carte(f(x)) e fals, deci inv(x) adevarat, din prima clauza), sau ca x sa fi citit tot (atunci si citit(x, f (x)) e adevarat, negatia e falsa, deci din clauza 2, inv(x) e adevarat) Reciproc, daca inv(x) e fals, atunci din (1), f (x) e o carte necitita de x Deci daca x citeste toate cartile, inv(x) e adevarat - acesta e sensul afirmatiei initiale Deci formulele (1) si (2) au inteles foarte diferit, desi singurul lucru care difera in scrierea initiala e pozitia unor paranteze! Negarea unei formule cuantificate schimba cuantificatorul —Vx| formula | = Bx — | formula! —Elx| formula! = Vx — | formula! Transformand interiorul unei formule cuantificate nu se schimba cuantificatorul Vxformulai = Vxformula transformata! E evident: lucrand doar cu partea independenta de cuan-tificator (din interiorul acestuia) nu e niciun motiv sa se schimbe cuantificatorul, sa apara negatii in exterior, etc Un exemplu e (2) mai sus: transformarea implicatei nu schimba cuantificatorii VxVy Pentru a demonstra prin reducere la absurd, intai negam concluzia Demonstrata prin reducere la absurd inseamna sa presupunem toate ipotezele adevarate, si concluzia falsa, si sa obtinem (prin rezolute) o contradicte Primul pas e sa negam concluzia, inainte de orice alta transformare, skolemizare, etc in caz contrar, nu obtinem rezultatul corect Pe un exemplu simplu: daca concluzia e VxByP(x,y), negatia da —VxByP(x,y) = BxVy — P (x,y) Skolemizand, x din exterior e o constanta a, si avem clauza unitate — P (a,y) Daca insa skolemizam formula ca atare, y e functe de x, si avem P(x, f (x)) Negand in acest moment, obtinem —P(x, f (x)), ceea ce desi are tot P negat, e gresit, complet altceva, si va duce la un ratonament si sau rezultat incorect Logica si structuri discrete Note de curs 2 Marius Minea A: Oricine a castigat un meci a pierdut un meci B: Nimeni nu a castigat toate meciurile Formalizati afirmatiile Sunt consistente? Echivalente? Demonstrati in primul rand, identificam notiunile din enunt care au semnificate anrudita, si anume verbele a castiga si a pierde Consideram ca un meci e fie castigat, fie pierdut, deci inlocuim direct "pierdut" cu negatia lui "castigat" (in caz contrar, se vede ca B nu implica A, putem avea o persoana cu un meci castigat si unul la egalitate, ceea ce satisface B dar nu A ) Ar mai fi de discutat daca introducem notiunea de a juca un meci (nimeni nu poate nici castiga nici pierde un meci nejucat, de exemplu un meci intre alti doi) Ramanem la varianta simpla de a asimila un meci unei probe, despre care se poate spune "castigat" sau nu, adica "pierdut" in al doilea rand, interpretam "nimeni" si "oricine" ca referindu-se la toate elementele universului; altfel introducem un predicat persoana pentru a distinge (iinle ("cine") de alte entitati Pe de alta parte, "castigat" din A si B se referaa doar la meciuri, deci avem nevoie de un predicat "meci" (Dacaa nu aparea negatia, puteam lucra direct cu un predicat "castiga meci" ) Cu acestea, putem formaliza: A: Vx 9y(m(y) Л c(x, y)) ! 9z m(z) Л :c(x, z) B: -9xVy m(y) ! c(x, y) sau, ducand negatia spre interior Vx9y m(y) Л -c(x, y) De remarcat ca in A e vorba de doua meciuri distincte, unul castigat, y si unul pierdut, z Pentru echivalenta lui A cu B trebuie sa aratam A ! B si B ! A Abordam ambele prin reducere la absurd Negand B ! A obtinem B Л -A (ipoteza si negatia concluziei) Pentru -A obtinem: -Vx 9y(m(y) Л c(x, y)) ! 9z m(z) Л -c(x, z) = 9x 9y(m(y) Л c(x, y)) Л -9z m(z) Л -c(x, z) = 9x 9y(m(y) Л c(x, y)) Л Vz -m(z) V c(x, z) Eliminam cuantificarea existentiala prin skolemizare in -A avem doua constante pentru x si y: m(b) Л c(a, b) Л Vz -m(z) V c(a,z) in B, avem o functie p(x) care ne da un meci pierdut de x: Vx m(p(x)) Л -c(x,p(x)) Eliminam V si obtinem forma clauzala pentru -A Л B: m(b) Л c(a, b) Aplicand metoda rezolutiei, si unificand clauza 3 cu 4 obtinem, cu z = p(x), Л -m(z) V c(a, z) clauza c(a,p(x)) Unificand cu clauza 5, obtinem (cu x = a) clauza vida si Л m(p(x)) contradictia dorita Deci B ! A Л -c(x,p(x)) implicatia e evidentaa si informal: Nimeni nu a castigat toate meciurile, deci fiecare a pierdut maacar un meci, in particular si oricine a castigat un meci (A) Nici in demonstrata dinainte nu am folosit clauzele 3 si 4 care exprima premisa (nefolosita) a lui A Ratonamentul e de tipul: p ! (q ! p) (ceea ce ne amintim caa era una din axiomele logicii propozitionale si a predicatelor) incercam sa demonstram A ! B obtinand o contradicte din A Л -B Eliminam implicata din A: Vx -9y(m(y) Л c(x, y)) V 9z m(z) Л -c(x, z) = Vx Vy(-m(y) V -c(x, y)) V 9z m(z) A-c(x, z) Scriem pe -B: 9xVy m(y) ! c(x, y) = 9xVy -m(y) V c(x, y) Skolemizam: in A, p(x) e meciul pierdut de x; in -B, a e o constanta pentru x Redenumind pe y din -B obtinem forma clauzala pentru A Л -B (de retinut ca skolemizarea se face numai dupa au ramas doar V si Л, cu toate negatile duse inauntru): Unificand 1 si 3 (cu tot trei m) obtinem -c(x,y) V c(a, y) Unificand 2 si 3 (cu tot trei c) obtinem -m(p(a)), etc Totusi, nu obtinem clauza vida, in particular nu "dispar" literalii -m( ) prezent in toate trei clauzele initale (-m(y) V -c(x, y) V m(p(x))) Л (-m(y) V -c(x,y) V -c(x, (p(x))) Л (-m(z) V c(a, z)) Aceasta ne indica problema: A Л -B nu e o contradicte intr-un univers in care nu sunt meciuri! Atunci A e adevarata (fals implica orice), dar B e falsa, fiindca nu exista y cu m(y) ! Adaugand constrangerea 9y m(y), skolemizata ca m(b), rezoluta ne duce usor la clauza vida Acesta nu e un paradox Recitind definita unei interpretat, remarcat ca universul U trebuie sa fie nevid Nu este necesar insaa ca el saa continaa cate un element de orice fel (meciuri, sau cai verzi) in practicaa, putem obtine rationamente false bazandu-ne pe existenta unui obiect care satisface o conditie (posibil complicata si cu erori), daca de fapt acea conditie nu e realizabila informal, suntem tentati saa demonstraam B (oricine are un meci pe care l-a pierdut) in felul urmaator: oricine fie a castigat fie a pierdut un meci Daca l-a pierdut, avem concluzia dorita, iar daca l-a castigat, A ne asigura ca a pierdut alt meci Eroarea (greu de observat) e in premisa despartirii in doua cazuri (evidentate mai sus): ea presupune ca exista meciuri Aceasta ne arata importanta formalizati La examen s-a punctat incercarea de a demonstra echivalenta (chiar argumentata informai) Majoritatea au afirmat-o insa fara vreun argument, sau au aratat doar o implicare, nu amandoua Doua (sau mai multe) afirmati sunt consistente daca nu duc la o contradicte Era suficient de indicat un (mic) exemplu (o interpretare) in care A si B sunt adevarate Specifying and Verifying Partial Order Properties Using Template MSCs* Blaise Genest* 1, Marius Minea2, Anca Muscholl1, and Doron Peled3 1 LiAFA, Universite Paris Vii & CNRS 2, pl Jussieu, case 7014 75251 Paris cedex 05, France 2 Department of Computing, "Politehnica" University of Timisoara Bd V Parvan nr 2, RO-300223 Timisoara, Romania 3 Department of Computer Science The University of Warwick Coventry, CV4 7AL United Kingdom Abstract Message sequence charts (MSC) are a graphical language for the description of communication scenarios between asynchronous pro-cesses Our starting point is to model systems using an assume-guarantee formalism, in the style of LSCs and Triggered MSCs We enrich MSCs with the possibility of using gaps (template MSC), and show their ex-pressivity This formalism also allows to express logical formulas We analyze the model-checking problem, whose complexity is linear in the size of the system, and ranges from PTiME to EXPSPACE in the size of the template formula 1 introduction Concurrent systems are intricate and hence difficult to describe The classical description, stemming from programming practices, is based on listing the different concurrent participants, e g , the processes The Message Sequence Charts (MSC) formalism allows an alternative "sequential" description of a concurrent system, where the complete behavior of all the processes involved in some given task are depicted in a visual way The language enjoys widespread use in the specification of telecommunication protocols and has been standardized by the iTU-T in a single MSC we can describe the behavior of all the processes involved, including the local actions and the messages exchanged between them Such a slicing of the concurrent execution provides further intuition about the behavior of the system One of the drawbacks of this representation is that tasks are seldom executed in a sequential way, and some overlap commonly exists in this paper we study an MSC-related formalism that allows expressing non-contiguous tasks This is done by adding gaps to the MSC formalism intended for the analysis of systems, we present the formalism and study related verifica-tion problems We are influenced in our proposal by Live Sequence Charts and Triggered MSCs , and include an assume-guarantee mechanism, i e , being able to require the execution of a task provided that another task was executed * Work supported by the EU-TMR project GAMES i Walukiewicz (Ed ): FOSSACS 2004, LNCS 2987, pp 195-210, 2004 © Springer-Verlag Berlin Heidelberg 2004 196 B Genest et al While an individual MSC has a formally defined semantics, its relation to the system behavior is left open by the standard: the usual interpretation is that the scenario should be possible in the implementation in defining Live Sequence Charts, Damm and Harel extensively emphasize the duality of mandatory and provisional semantics, but with a much wider set of features, including abort exit conditions and reliable or lossy transmission The provisional seman-tics is used with the standardized High-Level MSCs (HMSCs for short), that are described by (hierarchical) graphs with nodes labeled by MSCs The semantics of an HMSC is the set of MSCs formed by concatenating (process by process) MSCs seen along a path HMSCs have several drawbacks, such as the difficulty to express concurrency between two independent threads, due to the sequential control of the graph The result is that many systems are hard to model using HMSCs To address this problem, other kinds of specifications have been proposed, e g based on Petri nets with transitions labeled by MSCs A totally different approach is taken by Triggered MSCs They replace the sequential description of HMSCs by an assume-guarantee formalism (that also exists in LSCs in form of activation messages) Causality is expressed by structuring a specification with two components: a precondition that identifies the initial behavior, and a postcondition expressing the continuation supposed to be guaranteed under this assumption Assume-guarantee combined with the parallel operator emphasizes compositionality: a system description is most eas-ily obtained combining MSCs for collections of directly interacting processes, and superimposing assume-guarantee patterns that further constrain interac-tions between individual scenarios We are inspired by the Triggered MSCs notation Our suggestion attempts to improve on several points, for example, making the use of infinite assume-guarantee easier to understand Our main contribution is to define template MSCs, and use them in the Triggered MSCs setting We achieve conciseness by specifying only events strictly needed to identify a scenario and by using gaps as placeholders for other messages With gaps, parallel composition can be simply expressed as conjunction, without the need for parallel (shuffle) operators Using assume-guarantee template MSCs we can easily specify loops and thus infinite specifications A second important use of assume-guarantee template MSCs is the ability to easily specify properties that a system should satisfy (the system is given here as a set of FSMs communicating through (existentially) bounded FiFO message queues, or as an HMSC) We can express temporal properties, e g the fact that whenever A happens, B should eventually follow Compared to temporal logics, MSCs have the advantage of being a visual formalism, and thus easier to use in a design and engineering environment Moreover, template MSC formulas are a fragment of a partial-order global logics with filter, whose complexity would be much higher We study the complexity of verifying temporal properties expressed by various classes of templates and show that it ranges from PTime to ExpSpace in the size of the formula, and is linear-time in the size of the system Specifying and Verifying Partial Order Properties Using Template MSCs 197 One of the main differences between template MSCs and LSCs or Triggered MSCs is the use of gaps inside the MSC notation, in order to express an arbitrary (but finite) amount of communication or events The user can also draw single send receive events, with the matching event being located in a gap Another difference is that we are using template MSCs as a visual specification formalism, as an alternative to temporal logic specification Our specification is partial-order based, related to logics such as LTrL , TLC and MSO A variant of model-checking for MSCs and HMSCs is considered in it uses an alternative semantics that consists in adding gaps between each pair of events on each process This allows combating the undecidability of HMSC intersection The approach in this paper is different Gaps are added in the specification, and their locations and types need to be explicitly specified For the full version see http:  www crans org  genest fossacs03 full paper ps 2 Message Sequence Charts and Templates Message Sequence Charts (MSC for short) is a scenario language standardized by the iTU, They represent simple diagrams depicting the activity and commu-nications in a distributed system The entities participating in the interactions are called instances (or processes) and are represented by vertical lines Message exchanges are depicted by arrows from the sender to the receiver in addition to messages, atomic actions can also be represented The left part of Figure 1 gives an example of an MSC M modeling two messages sent between a Writer W and a Server S Definition 1 An MSC is a tuple M = (P, E, A,  , m, MS are assume-guarantee template A'iSCs that define sets of A'iSCs, denoted by  )Aia " Mg),  (Aia " -iMfl) ' 200 B Genest et al — L(Ma Mg) = {N € MSC | for every decomposition N = ST, either S € L(Ma) or T € L(Mg)} — L(Ma —Mg) = {N € MSC | for every decomposition N = ST, either S € L(Ma) or T € L(Mg)} For an example of an assume-guarantee template MSC see Figure 4 Notice that S, T can be CMSCs, but ST = N is required to be an MSC Note also that assume-guarantee template MSCs generalize MSCs, since every MSC M can be represented as e M, where e is the empty MSC A template MSC formula is a conjunction Дi(Ma ( Д ± Mgj)), where ± means that guarantee MSCs may appear in either positive or negated form That is, for each of the individual assume-guarantee specifications of the out-ermost conjunction, we have preconditions in form of positive scenarios, and postconditions as disjunctions of either positive or negative (forbidden) scenarios Hence, an MSC N belongs to L(Ma Vj ± Mj) if for every decomposition N = ST, whenever S G T(Ma) either T G T(Mj) for some positive Mj, or T €  (Mj) for some negative Mj This conditional description allows in particular the guarantee false, with L(false) = 0 For example, an MSC N satisfies M false iff no prefix of N is in L(M) The formula e —M describes the complement of L(M) 3 Modeling Using Template MSCs A first application of template MSCs is for modeling protocols easier than with HMSCs The drawback of the standard notation of HMSCs is that one needs a global (graph) description combining several scenarios, resp behaviors of the system Using template MSCs, we model each behavior locally, that is each scenario is described on the processes that it involves We restrict then the combination of these local behaviors by using template formulas in the latter step, using template MSCs allows us to focus only on the relevant messages in a scenario, and avoid both repetition and the inclusion of unrelated messages S R S R W S W S write(x) write(x) read(x) read(x) i fail val(x) val(x) abort commit commit abort N2 N1 rb N3 N4 Fig 2 Global Behavior of Writer-Server-Reader System: (N1 V N2)* || (N3 V N4)* We present an example that illustrates the major features of our approach, namely the reader-writer example, taken from The system consists of three Specifying and Verifying Partial Order Properties Using Template MSCs 201 processes: a writer W and a reader R which concurrently access variables main-tained by a server S The latter has the task of maintaining atomicity and serialization of read and write operations, each of which are performed in two phases Since triggered MSCs cannot deal easily with infinite specifications, the example from involves a single read write operation With template formu-las we do not have this problem, so we extend this example to arbitrary many write read operations The writer W performs a tentative update of variable x by sending a message w(x) to the server S; x is now in a "dirty" state Then, W performs a local action ok or fail which decides on the outcome of the write, and sends the corresponding message commit or abort to the server A commit marks the variable as "clean" An abort causes the server to perform a local rollback action rb and potentially influences a read in progress The reader R can send the server a request r(x) for the variable x, to which the server responds with a value val(x) Subsequently, the server either follows up with a commit message, if the sent value was clean, or has been since committed by the writer, or sends an abort if the sent value has to be rolled back Although many different orders of interactions between the three processes are possible, the interaction between the pairs of directly communicating processes is simple Our system description above contains a pair of basic scenarios for both writer-server and reader-server interaction, depicted in Figure 2 The global behavior is a subset of that given by composing these individual scenarios Using a notation similar to Triggered MSCs, we would write (N1 У N2)*||(N3 V N4)* However, one side effect of gaps is that they make the definition of a parallel composition operator unnecessary, assuming that we compare MSC with different type sets To express N1||N3 for instance, it suffices to extend both MSCs to all three processes, and add gaps in between all messages The gaps in N1 (resp N3) allow only events of N3 (resp N1) Then, parallel composition simply becomes conjunction (language intersection):  (N1||N3) = L(N1) n (N3) We need slightly more work for expressing the star of languages First, we need an initialization step (e M1 V M2) for the writer, meaning that every MSC in the specification should begin on W, S by a write, and then either ok and commit, or fail and abort Anything can happen next, as allowed by the unrestricted gap 7* The MSCs M1,M2 are defined in figure 3, where the gap 7* has no restriction, while yr is restricted to events of N3, N4 By adding an inductive step we obtain the specification Namely, we need that either M1 or M2 happens after each message commit (or event rb), or there is no more event on W, S (gap yr), specified as: 7*commit Mi V M2 V yr Л rb M4 V M2 V yr The same applies for M3, M4 These individual scenarios are interdependent, so the global system behavior is obtained by imposing additional constraints on their composition We divide the constraints into an assumption part that identifies the initial behavior in a scenario, and a guarantee part expressing the behavior expected of the system under this assumption 202 B Genest et al WS RWS RWS RWS R 1 E YR E YR 1 e YW E YW write(x) |ok [ YR write(x) read(x) read(x) 1 fail 1 YR ] e Yw J E Yw J ] abort val(x) val(x) commit L YR i e YW J E YW J 1 1 rb commit abort г Y* ] г Y* ] г Y* ] r Y* Mi M2 M3 M4 Fig 3 initialization: e V M2) Л e V M4) For our WSR example we identify 5 cases specified with the constraints in Figure 4 M5 states that if a write on the Server is followed by a send of x to the Reader, and the Writer aborts (precondition), then the Server should inform the Reader about the abort (post-condition) The occurrence of the read is guaranteed by M3 or M4, so it needs not be specified again Likewise, M1, M2 ensure that if there is no write between a send of x and an abort of the Writer, then the write has oc-curred in the first gap This precondition will imply an abort for the read (postcondition) The remainder of the interaction needs not be specified, so we allow gaps in between these actions, corresponding to send-ing and receiving other messages The other cases correspond to a commit of the value Namely, a value is sent while no write has been produced (M6), or a value is sent after the last write has been roll-backed (M7) committed (M8), or a commit is received immediately after the value is sent (Mg) W S R E 1 abort val(x) - E ] abort г WS R W S RWS RWS R E ] E ] E ] E val(x) 1 1 rb commit commit val(x) E ] L ] val(x) val(x) E ] - E ] commit L J L J commit commit commit E 1 E 1 E 1 E 1 Mq M7 Mq Mg Fig 4 Assume-Guarantee Scenarios for Writer-Server-Reader System Hence the constraint is M5 Л M6 Л M7 Л M8 Л M9 Without template MSCs, we would need to write at least every possible instantiation for gaps in our 5 cases, yielding at least 12 cases For instance, an HMSC specifying the same Specifying and Verifying Partial Order Properties Using Template MSCs 203 model would require at least 19 states Moreover, the size increases even more severely (exponentially) if instead of a single reader we allow several ones With template formulas we express the constraints for each pair Writer Reader, while an equivalent HMSC has to describe all possible combinations over all Readers This lack of conciseness of HMSCs is a real drawback, since many algorithms involving MSCs are at least NP-hard First, HMSCs are unable to represent the parallel composition, which can lead to an exponential blow-up compared with template formulas, and to specifications that are harder to understand Second, HMSCs are finitely generated, which prevents them from implementing simple protocols such as the alternating bit Third, HMSCs cannot be complemented in general Hence, since template formulas implicitly complement the assume part they are not subsumed by HMSCs 4 Specifying Properties 4 1 Logical Properties Template MSC formulas can describe easily and in a concise way some interesting properties and can be model-checked (see next section) We can use them for describing global properties of MSC configurations and use gaps as filters, i e , for restricting the types of events We denote in the examples below by y an unrestricted gap over all processes, and by y-a a gap that can generate all event types except for a — (yA) false = e ^(yAy): No execution contains the MSC A — у yAy: Every execution contains infinitely often the MSC A — yA " yBy: Whenever A occurs, eventually B will occur — (yA yay) Л [e (y-a V y-aAy)]: The MSC A may occur if this is the case, then the event a must follow Moreover, event a cannot occur before A One can see a as an alarm event that is triggered by A The theorem below shows that the expressiveness of compositional gaps has a drawback, namely that the satisfiability problem for template formulas is unde-cidable in general However, we can check the satisfiability of a template formula 5 if we ask only for MSCs that have at least one linearization where the size of each channel is bounded by some given value b (it is possible that other equivalent linearizations have higher bounds) A set S of such MSCs is called existentially b-bounded For instance, every HMSC (even every realizable compositional HMSC, see ) is existentially bounded Theorem 1 1 Given a bound b and a template MSC formula S, it is decidable whether there exists an existentially b-bounded MSC in L(S) 2 it is undecidable whether a template MSC formula S satisfies L(S) = ty The proof of the first statement above follows from the results in the next section For the second statement, we reduce from the Post correspondence problem, making use of the unbounded communication channels 204 B Genest et al 4 2 Model-Checking Template Formulas We consider now the problem of verifying an implementation of a communication protocol S with respect to a template MSC formula Ф A different approach using partial order MSO for the specification Ф gives decidability for the model-checking problem , albeit at very high costs As suggested by Theorem 1, the system S needs an existential bound on buffers, denoted by bs This includes protocols modeled by HMSCs, communicating finite state machines with existentially bounded FiFO buffers (and even realizable compositional HMSCs, see ) The model for the implementation here is a finite automaton (FSM), gen-erating linearizations of MSCs We do not require that S is linearization-closed, i e , S may generate a linearization of some MSC without generating all of them We can obtain a linear-size FSM from any (realizable compositional) HMSC it suffices to replace each node by a linearization of the CMSC labeling the node Definition 5 For an FSM S and an assume-guarantee MSC Ma FMg we write S |= (Ma >—> ±Mg) if L(S) C  (Ma >—> ±Mg) The satisfaction of a template formula is defined according to the usual semantics of Л, V in the following, we give complexity results for checking S = Ф for various classes of template MSC formulas Ф While S can be very large, real life formulas Ф and existential channel bounds bs are pretty small Hence we focus on keeping the complexity linear w r t S We will transform the formula into an automaton, so our algorithm will be automata-based Moreover, checking that S = Дi Фі is done for each Фі separately Proposition 1 Given an FSM S with channel bound bs and a template MSC Mg, we can check whether S = e Mg in space exponential in bs | Mg | and logarithmic in |S | Proof Let E be the set of events in the template MSC Mg Let M be the MSC obtained from Mg by replacing each gap by the empty MSC Let us fix a linearization x = x1 • • • xn of M We show how to construct an NFA Ax accepting every linearization of Lin(Mg) whose events occur in the order given by x For each gap 7 of Mg and each process p that is allowed in 7 we use a new symbol gp We first set the beginning and the end of 7 on process p by choosing two positions i r on q would satisfy xp fMonies;  *@ represents coins  such th:at @ coins == (  sum int i; @ 0 0 together with a differentiable curve f that maps every real time in the compact interval to an i O state For types other than R, we assume that only constant functions are differentiable The source of the flow is the i O state f (0), and the sink is f (J) For any two successive execution steps, the sink of the first must coincide with the source of the second in figures, arrows with double tips denote flows, whereas normal arrows represent jumps The set Ea of executions is prefix-closed indeed, if a component permits a flow of a certain duration, then all restrictions of the flow to shorter durations, including the restriction to duration 0, are also permitted Every component is deadlock-free, in the sense that (1) if the jump entry condition of a location a is satisfiable at an i O state q, then there is an execution with origin a which starts with a jump with source q, (2) if the flow entry condition of location a is true at q, then there is an execution with origin a which starts with a flow with source q, and (3) every execution that does not end in a destination location can be prolonged by either a destination or a jump indeed, the stronger condition of input-permissiveness holds, which asserts that a component cannot deadlock no matter how the environment decides to change the inputs, by either jumping or flowing Prefix-closure, deadlock-freedom, and input-permissiveness are formally defined and proved in the full version of this paper They are essential properties of every component, because the environment (another component) may decide to interrupt a flow at any time to perform a jump, in which case the component must be prepared to match the environment jump by a local jump Atomic components Every component in Masaccio is built from two kinds of atomic components, with discrete and continuous behavior, respectively An atomic component has an arbitrary number of input and output variables, but only two locations, which serve as origin and destination, respectively, for its executions, all of which contain a single step For an atomic discrete component, that step is a jump; for an atomic continuous component, a flow The legal Vi jumps of an atomic discrete component are defined by a jump predicate, which constrains the output values of the sink depending on the source i O state and on input values of the sink Such a predicate is typically specified by a differ-ence equation The legal flows of an atomic continuous component are defined by a flow predicate, which constrains the time derivatives of output variables depending on the current i O state and on the current time derivatives of input variables Such a predicate is typically specified by a differential equation, as in Figure 2 A flow predicate may also constrain the values of output variables, so that a flow must not go on for any duration that would violate this "invariant" condition Both jump predicates and flow predicates may allow nondeterminism Operations on components Discrete components are built from atomic discrete components using the six operations of parallel and serial composition, vari-able and location renaming, and variable and location hiding, arbitrarily nested The discrete components conservatively extend Reactive Modules [AH99] by serial composition Hybrid components are built from both discrete and continuous atomic components using the same six operations Parallel composition is defined synchronously, as conjunction, with static await dependencies between outputs and inputs preventing circularity For two compo-nents A and B, an execution of the parallel composition A||B starts at a common location in La   LB The execution is synchronous in both components: each jump of A must be matched by a concurrent jump of B , and each flow of A must be matched by a concurrent flow of B with the same duration Control exits the parallel composition when it exits any one of the two components if the execution of A reaches a destination location, then the concurrent execution of B is preempted and terminated; if B reaches a destination location, then the concurrent execution of A is terminated; if both A and B simultaneously reach destination locations, then the result is nondeterministic When constructing a parallel composition A||B , inputs of A can be identified with outputs of B, and vice versa, by renaming variables Such identifications are depicted by solid lines in the figures Similarly, locations of A can be identified with locations of B by renaming locations; these identifications are depicted by dotted lines We write A[x := y] for the component that results from renaming the variable x in A to y, and A[a := b] for the component that results from renaming the location a in A to b in Figure 1, the component Robot a is the parallel composition of the components ControlA and MotorA Before composition, the two entry locations eC and eMT are renamed to a common location eR Serial composition and location hiding can be used to achieve the sequencing of components Serial composition represents disjunctive choice between the exe-cutions of two components For two components A and B , an execution of the serial composition A + B is either an execution of A or an execution of B Hiding renders a location internal to a component, and inaccessible (invisible) from the outside The executions of the resulting component are obtained by stringing together at that location any finite number of executions of the original com- Vii obsta: B Y'+  es left'A = T A right A = T Alead a = T xAswitchA = F a Si a leftA: B rightA: B leadA: B switchA: ] >xs hide obstA: B sLldlgl iLa eA rightA: B - left A = T C Aright'A = T   : obsta lead A: B- es a AleadA = T B  true Aswitch'A = F switch A: -B Fig 3 Serial composition and location hiding ponent To avoid internai deadlock, a location a can be hidden only if its jump entry condition is valid, so that it can always take another jump at a We write A a for the component that results from hiding a in A Figure 3 shows how a sequential component (representing the straight move-ment of the robot in the lead mode) is obtained by the serial composition of several components, followed by location hiding Let Straighta = (S1 + S2 + S3) a, where S1 and S3 are atomic discrete components, and S2 is obtained from an atomic continuous component by renaming destination location to origin loca-tion The resulting component initializes its output variables by a jump, flows (without output changes) for any amount of time as long as the input obsta remains false, and nondeterministically exits with a jump in the same way, any "automaton structure" can be built from individual "edges" (i e , atomic components) using serial composition, location renaming, and location hiding Variable hiding builds an abstract component by turning some outputs of a component into internal state Hidden variables, however, do not maintain their values from one exit of a component to a subsequent entry, but they are nonde-terministically reinitialized upon every entry to the component so as to satisfy the applicable entry condition We write A x for the component that results from hiding the output variable x of the component A 3 Assume-Guarantee Reflnement between Components if component A reflnes component B, then B can be viewed as a more abstract (permissive) version of A, with some details (constraints) left out in B which are spelled out in A in particular, in the trace-based semantics of concurrent systems, reflnement is taken to be the containment relation on trace sets if A reflnes B , then A is a more speciflc description of system behavior than B in the sense that A may be equivalent to B  C for some parallel context C which constrains the inputs to B in analogy, in the trace-based semantics of sequential systems, reflnement ought to be interpreted as preflx relation on trace sets if A reflnes B , then A is a more speciflc description of system behavior than B in the sense that A may be equivalent to B + C for some serial context C which constrains the continuations of B Consequently, in Masaccio, if A reflnes B, then A may specify fewer traces and longer traces than B The reflnement relation Component A refines component B if the following two conditions are satisfled: Viii 1 Every output variable of B is an output variable of A, every input variable of B is an i O variable of A, and the dependency relation of B is a subset of the dependency relation of A 2 For every execution (a, w) (or (a, w,b), respectively) of A, either (a, w[VB]) (or (a, w[VB],b), respectively, where w[VB] is the projection of w to the variables of B) is an execution of B, or there exist a proper, nonempty prefix w0 of w and an interface location c 2 LB such that (a, w'[VB],c) is an execution of B Note that the second condition implies that every interface location of A is an interface location of B Furthermore, by input-permissiveness, if A refines B, then for every location a of A, the jump entry condition of a in A implies the jump entry condition of a in B, and the flow entry condition of a in A implies the flow entry condition of a in B Compositionality All six operations on components are compositional Theorem 1 Let A and B be components, let x and y be variables, and let a and b be locations so that the following expressions are all well-defined if A refines B, then A||C refines B  C; and A + C refines B + C; and A[x := y] refines B[x := y]; and A[a := b] refines B[a := b]; and A x refines B x; and A a refines B a More generally, define a context to be a component expression that can take a component as a parameter For instance, if (A + B)||D is well-defined, we can regard C[•] = ([•] + B)  D as a context for component A Corollary 1 Let C[•] be a context for both Ai and A% if Ai refines A2, then C[A1], refines C[A2] Assume-guarantee reasoning Our assume-guarantee rule states that for discrete components, if two components can be individually replaced in a context while maintaining refinement, then both can be replaced simultaneously There-fore, in order to show that a complex component C[A1, B1] (the "implementation") refines a simpler component C[A2,B2] (the "specification"), it suffices to look at simplified versions of the implementation one at a time First, we prove that A1 refines its specification B1, under the "assumption" B2; then, we prove that A2 refines its specification B2, under the "assumption" B1 This reason-ing is inherently circular A special case is the assume-guarantee rule for the parallel composition of Reactive Modules [AH99]: take the context C[o, •] in the following theorem to be o||" The proof relies on the deadlock-freedom and input-permissiveness of components it also requires that each execution of a serial composition can be uniquely assigned to one of the components This can be achieved by disjoint entry conditions We say that the serial composition A + B is jump-deterministic if for all common interface locations a 2 La   LB, the conjunction " j"mp("V i j"mp(a) is unsatisfiable and flnwdeterminivtic ifihflow (n)  ihflow (a) 'A (a)   r b (a) is uinsaLisiiauie, ai ha j low—ueteiiibiiiibLic ii w a (a)     ' b (a) is unsatisfiable for all a 2 La   LB The serial composition A + B is deterministic if it is both jump-deterministic and flow-deterministic iX XL Fig 4 Component LeadA For hybrid modules, we need to break the circularity of the rule, by relaxing one assumption, say, B2, to allow arbitrary flows at all hidden locations We write rlax(B2) for the component that results from B2 by (1) replacing every flow predicate in B2 by true, and (2) serially composing every hidden location a of B2 which is not the origin location of any flow, with an atomic continuous component that permits all flows from origin a to destination a Theorem 2 Let C[o, •] be a context whose arguments are not in the scope of any variable or location hiding Suppose that all input variables of C[A2, B2] are variables of C[Ai, Bi], and that within C[A2, B2] the context arguments are not within the scope of any nondeterministic serial composition if C[Ai, rlax(B2)] refines C[A2, rlax(B2)], and C[A2,Bi] refines C[A2,B2], then C[Ai,Bi] refines C [A2,B2] Linear components if all flows are specified by linear differential equations, and no degenerate flows of 0 duration can be enforced, then the existence of unique solutions allows us to strengthen the assume-guarantee rule in this case, we can make circular assumptions about the flows An open linear condition on a set V of real-valued variables is a conjunction of boolean variables and strict ( ) comparisons between linear combinations of the variables in V Consider a flow action F (consult the appendix for a definition) The atomic continuous component A(F) is linear if (1) all variables in ) have the type R, and (2) the flow predicate has the form a(Xp) Л (Zp = p(Xp, Yp)), where a is an open linear condition, called invariant, on the source variables Xp, and p is a set of linear combinations, one for the derivative z 2 Zp of each controlled flow variable, of the source variables Xp and the derivatives Yp of the uncontrolled flow variables A component is linear if (1) all its atomic continuous components are linear, and (2) all its serial compositions are flow-deterministic Let rlax' be defined like rlax , with the difference that only the invariants rather than the flow predicates are replaced with true Theorem 3 Let C[o, •] be a context whose arguments are not in the scope of any variable or location hiding Suppose that C[Ai, Bi] and C[A2, B2] are linear components, that all input variables of C[A2,B2] are variables of C[Ai,Bi], and that within C[A2, B2] the context arguments are not within the scope of any nondeterministic serial composition if C[Ai, rlax'{B2)] refines C[A2, rlax0(B2)], and C[A2,Bi] refines C[A2,B2], then C[Ai,Bi] refines C[A2,B2] X obst A: esi,es2 StraightA A: B rightA: B leadA: B left a = T ( Л right a = T A cg^a = T —Г true ' Л switcha = г : obstA obst A: ,xs switch A: ет 0 0A clktA = — 1 clktA = 0 Л: obst a clktA =0 Л obsta left A: B rightA: B leadA: B XT switch A: - Fig 5 Components StraightA and TurnA 4 A Two-Robot Example We continue the presentation of the two-robot system whose overall view was given in Section 2 Robot A (Figure 1) starts out as the leader After a while it may move from LeadA to Foiiowa, as indicated by the dotted line connecting location xL (with an unsatishable entry condition, which is not shown) and location eF it may then move back to lead mode (line xF-eL2) Robot B has the same structure, except that it starts out in follow mode Within the subcom-ponent MoveA (Figure 4), the robot can execute in StraightA arbitrarily long while there is no obstacle Upon sensing an obstacle, control is passed to the component TurnA, which commands the robot to rotate for an amount of time given by timer variable clktA Control then returns to the component StraightA The sequence of straight moves and turns continues until robot B switches to leading status This event is modeled by the boolean signal switchB, which is monitored by the component SwitcherA We require the switcher unit to preempt execution of the lead mode within a specified amount of time Tsw after the other robot has signaled its intention to lead Once LeadA is exited, control enters the component Foiiowa, which samples the values of leftB and rightB and drives its own motor signals left a and right a The robot may stay in the follow mode arbitrarily long, provided that obsta is false At any time it may also issue the signal switchA, exit the component Foiiowa and switch back to lead mode We now present a robot implementation that contains a modified component LeadA, which does not continuously observe the switch signal (Figure 7) instead, the implementation samples the leading indicators of both robots with a period Ted, as measured by the global clock clk if both robots are leading, a correction is made by the component ErrordetectA The new state depends on the last sampled values of the leading signals: the robot that had been leading before now switches to follow mode We wish to show that when composed together, two robot implementations refine the parallel composition of two robot specifications, provided that Ted xp leadA: B switch A: Fig 6 Components SwitcherA and FoilowA Note that Ca [Control A] does not refine Са[СопНоіа], because a robot implementation meets the specification only when composed with a symmetric robot This is where assume-guarantee reasoning helps All continuous components in the system are linear Hence by Theorem 3, it suffices to discharge the simpler assertions CA[ControlA]||CB [ControlB ] refines Ca [Control] ||CB [ControlB ] Ca [Control a] ||cb [ControlB] refines Ca [Control a] ||Cb [ControlB], where ControlB = rlax0(ControlB) We simplify further using compositionality (Theorem 1), and are left to prove that ControlA || ControlB refines ControlA || ControlB A ii B ci 11 B ControlA || ControlB refines ControlA || ControlB, two proof obligations that involve simpler components than the original one The power of the assume-guarantee rules of Theorems 2 and 3 stems from the fact that they can be applied to components arbitrarily deep in the design hier-archy, creating proof obligations which have smaller differences between the two components which are supposed to refine each other Acknowledgments We thank Rajeev Alur, Radu Grosu, and Edward Lee for many stimulating discussions References [ACH+95] R Alur, C Courcoubetis, N Halbwachs, T A Henzinger, P -H Ho, X Nicollin, A Olivero, J Sifakis, and S Yovine The algorithmic analysis of hybrid systems Theoretical Computer Science, 138:3-34, 1995 [AG00] R Alur and R Grosu Modular refinement of hierarchic reactive machines in Principles of Programming Languages, pp 390-402, ACM Press, 2000 [AGH+00] R Alur, R Grosu, Y Hur, V Kumar, and i Lee Modular specification of hybrid systems in Charon in Hybrid Systems: Computation and Control, LNCS 1790, pp 130-144, Springer-Verlag, 2000 [AH97] R Alur and T A Henzinger Modularity for timed and hybrid systems in Concurrency Theory, LNCS 1243, pp 74-88, Springer-Verlag, 1997 [AH99] R Alur and T A Henzinger Reactive modules Formal Methods in System Design, 15:7-48, 1999 [AL95] M Abadi and L Lamport Conjoining specifications ACM Transactions on Programming Languages and Systems, 17:507-534, 1995 [BRJ98] G Booch, J Rumbaugh, and i Jacobson The Unified Modeling Language User Guide Addison-Wesley, 1998 Xii clkA: R leadB: B obstA: B switch B:  6lc L ead  jleftA: B right A: B leadA: B switchA: ] Fig 7 Components LeadA and ErrordetectA [DGH+99] J Davis, M Goel, C Hylands, B Kienhuis, E A Lee, J Liu, X Liu, L Muliadi, S Neuendorffer, J Reekie, N Smyth, J Tsay, and Y Xiong Overview of the Ptolemy project Tech Rep UCB ERL M99 37, University of California, Berkeley, 1999 [DGV97] A Deshpande, A Gollu, and P Varaiya Shift: A formalism and a program-ming language for dynamic networks of hybrid automata in Hybrid Systems, LNCS 1273, pp 113-134, Springer-Verlag, 1997 [Har87] D Harel Statecharts: A visual formalism for complex systems Science of Computer Programming, 8:231-274, 1987 [Hen96] T A Henzinger, The theory of hybrid automata in Logic in Computer Science, pp 278-292, iEEE Computer Society Press, 1996 [Hen00] T A Henzinger Masaccio: A formal model for embedded components in Theoretical Computer Science, LNCS 1872, pp 549-563, Springer Verlag, 2000 [LSVW96] N A Lynch, R Segala, F Vaandrager, and H B Weinberg Hybrid i O Automata in Hybrid Systems, LNCS 1066, pp 496-510, Springer-Verlag, 1996 [McM97] K L McMillan A compositional rule for hardware design refinement in Computer-aided Verification, LNCS 1254, pp 24-35, Springer-Verlag, 1997 [MC81] J Misra and K M Chandy Proofs of networks of processes iEEE Transac-tions on Software Engineering, 7:417-426, 1981 [TAKB96] S Tasiran, R Alur, R P Kurshan, and R K Brayton Verifying abstractions of timed systems in Concurrency Theory, LNCS 1119, pp 546-562, Springer-Verlag, 1996 [US94] A C Uselton and S A Smolka A compositional semantics for Statecharts using labeled transition systems in Concurrency Theory, LNCS 836, pp 2-17, Springer-Verlag, 1994 Appendix: Formal Definition of Masaccio Let V be a set of typed variables For a variable x 2 V, denote by x' its primed version, and denote by x its dotted version The type of x' is the same as the type of x The type of x is R if the type of x is R, and {0} otherwise This is because on types other than R, we assume that only the constant functions are differentiable Let V' = {x' | x 2 V} be the set of primed versions of the variables in V, and let V = {x | x 2 V} be the set of dotted versions of the variables in V Let [V] be the set of type-conforming value assignments to the variables in V: if x 2 V and q 2 [V], let q(x) be the value assigned by q to x The interface of a component The interface of a component A consists of: — A finite set VA of typed input variables Xiii — A finite set VA of typed output variables, such that VA   VA = 0 Let VA = VA U VA be the set of i O variables The value assignments in [VA] are called i O states — An dependency relation -^A C VA x VA between i O variables and output variables, such that the transitive closure VA is asymmetric A set U C VA of i O variables is dependency-closed if x -^A y and y 2 U implies x 2 U — A finite set LA of interface locations — For each location a 2 La, a predicate FAump(a) on the variables in VA U VA , called juw,p entry condition, and a predicate  фА°^ (a) on the variables in VA, called flow entry condition The executions of a component A jump of a component A is a pair (p,q) 2 [VA]2 of i O states The i O state p is the source of the jump, and q is the sink A flow of A is a pair (5, f) consisting of a nonnegative real 5 2 R>0, and a function f: R ! [VA] from the reals to i O states which is differentiable, with time derivative f', on the compact interval C R The real 5 is the duration of the flow, the i O state f (0) is the source, and f (5) is the sink A step of A is either a jump or a flow of A The step w is successive to the step v if the sink of v is equal to the source of w An execution of A is either a pair (a, w) or a triple (a, w,b), where a, b 2 LA are interface locations, and w = w0 • • • wn is a finite, nonempty sequence of steps of A such that (1) every step wi, for 1 0, then for all " 2 , the flow predicate 'Fow is true if each source variable x 2 XF is assigned the value f (e)(x), and each dotted flow variable y 2 YF U ZF is assigned the value f°(e)(y) The triple (a, w,b) is an execution of A(F) iff the pair (a, w) is an execution of A(F), and b = to Parallel composition Two components A and B can be composed in parallel if their interfaces satisfy the following conditions: — va n vbo =; — There are no two variables x 2 VA and y 2 VB such that both x vB y and y Va x — For all a 2 La, if bAUmp(a) or flAow (a) is satisfiable, then a 2 LB For all a 2 LB, if bBU P (a) or ЬеЬ (a) is satisfiable, then a 2 LA For all a 2 LA П LB, the projections of the entry conditions of a in A and B to the common variables are equivalent: (9Va   VB)(9VA   VB ) bJAmp(a) is equivalent to (9Vb   Va)(9VB   VA ) bBUmP(a), and (9 Va   Vb ) bAow (a) is equivalent to (9Vb   Va) bBow(a) The interface of A||B is defined from the interfaces of A and B: — VA||B = (VA   VB) U (VB   VA) — Vao||b = VA U VB — Va||b = Va U Vb { LAjjB = LA U LB — if n c T , n Tu Cui, Ьг>итР(гЛ — bjumP F,) b bjumP (u) and bflow Fi) — bflow F,) b if a 2 La n lb , then b a || в (a) — b a (a) b bв (a) and ba || в (a) — ba (a) b bBoW (a) if a 2 La   Lb or a 2 Lb   La, then bj{W  B (a) = bA  B (a) = false The executions of A||B are defined from the executions of A and B The pair (a, w) is an execution of A||B iff (a, w[VA]) is an execution of A and (a, w[VB]) is an execution of B The triple (a, w,b) is an execution of A||B iff either (a, w[VA],b) is an execution of A and (a, w[VB]) is an execution of B, or (a, w[VB],b) is an execution of B and (a, w[VA]) is an execution of A Serial composition Two components A and B can be composed in series if VA = VB The interface of A + B is defined from the interfaces of A and B: — va+b = va u VB — VA+B = va = Vbo — Va+b = Va U Vb — La+b = La U Lb XV — Tf p G T "   Tp Ci p i,  jump (") "J   flow (n    flow ( ) n a 2 ta ii t b , inen   a+B (a) — 1 a (a) v  в (a) anu   a+B (a) —  a (a) v "   flow ( ) Tf " c T   Tp U,,,, 1  jwmp ( )    jwmpr ) J "   flow ( ) " ,flow ( ) |f   в (a) ti a 2 ta   tb , uien  a+b (a) —   a (a) anu   a+в (a) — 1 a (a) ti ,, c T "   T " ti -i, 1   jwmp П) "  jnmp ( ) j    flow П) "   flow ( ) a 2 tb   ta, uien   a+b (a) —   в (a) anu Ta+b (a) —   в (a) The executions of A + B are defined from the executions of A and B The pair (a, w) is an execution of A + B iff either (a, w[VA]) is an execution of A, or (a, w[VB]) is an execution of B The triple (a, w,b) is an execution of A + B iff either (a, w[VA],b) is an execution of A, or (a, w[VB],b) is an execution of B Variable renaming The variable x 2 VA can be renamed to y in component A if y has the same type as x, and either y is not an T O variable of A, or x and y are both input variables; that is, if y 2 VA, then x,y 2 VA The interface of the component A[x :— y] is defined from the interface of A Tf x 2 VA, then Vi[x:=y] — (VA   {x}) U {y} and V°[x =y] — VA; if x 2 V°, then VA[x =y} — VA and Vff^y] — (VA   {x}) U {y} in either case, let TA[x:=y] — TA, and let - ,    jwmp(b) d    ,flow (b) — 1  ,flow (p)    i ,flow (b) {a}) U{b};let  A[a:=b] (b) — 1A (a) v Ta (b) and  A[a:=b](b) — 1 A (a) V 1A (b) if b 2 Ta, let ’ Am:=pb](b') —  Amp (a) and fOW^] (b) — Ta  (a) if b 2 Ta, and let    jump  ,)   Jwmp ,) d   ,flow (p) — T ,flow O- ) f"r "ll c h  Ttp i A [ ь] (c) —   A ( c) anu   a [ b] (c) —   A (c) юі all locations c 2 -T a   {a, b} ine executions of the component A[a :— b] result from renaming a to b in the origins and destinations of the executions of A Variable hiding The variable x 2 VA can be hidden in the component A if x 2 VA The interface of the component A x is defined from the interface of A: let Vi— VA; let Vi x — VA   {x}; let VA x be the intersection of the transitive closure VA with VA x x Vo   lot   — lpt u dumpA,) — Пр) u dumpA,) я-nrd  flow (n) — Пр)  flow (n) VA x; let TA x — TA; let A x (a) — (9x) TA (a) and VA x (a) — (9x) TA (a) for all locations a 2 TA The executions of the component A x are defined from the executions of A The pair (a; w) is an execution of A x iff (a; w[VA x]) is an execution of A The triple (a; w; b) is an execution of A x iff (a; w[VA x]; b) is an execution of A Location hiding The interface location c 2 TA can be hidden in the component A if the jump entry condition  j^mp(c) is equivalent to true The interface of the component A c is defined from the interface of A: let VA " — VA; let V° " — VA; let - 2, such that w — w-1 • • • wn and the following are executions of A: the triple (a, wi,c), the triples (c, wA,c) for 1 2, such that w — w-1 • • • wn and the following are executions of A: the triple (a, wi,c), the triples (c, wA,c) for 1 0 Boolean formulas can be constructed from the state variables of the model A formula is said to be satisfied in a state if and only if the assignment of variable values in the state to the corresponding variables in the formula makes it true in general, a formula can be satisfied in many states, and we identify a formula with the set of states that satisfy it Boolean formulas can be represented canonically by binary decision diagrams (BDDs) Efficient algorithms exist for computing all logical operations on BDDs, as well as for computing existential quantification Symbolic model checking exploits this efficiency by operating on sets of States represented internally by BDDs For example, the BDD representing T(5) = {"' | N(s, s') holds for some s   S), the set of all successors of States in a state set S, can be easily constructed from the BDD for S and the BDD for the transition relation in one step, regardless of the number of states in S and T(S) The properties to be verifled by the model checker are expressed in computation tree logic, CTL Computation trees are derived from state transition graphs The graph structure is unwound into an infinite tree rooted at the initial state Paths in this tree represent all possible computations of the program being modelled Formulas in CTL refer to the computation tree derived from the model CTL is classified as a branching time logic, because ithas operators that describe the branching structure of this tree Formulas in CTL are built from atomic propositions (in our method, each proposition corresponds to a state variable in the model), boolean connectives -i and A, and temporal operators Each operator consists of two parts: a path quantifier followed by a temporal operator Path quantifiers indicate that the property should be true of all paths from a given state (A), or some path from a given state (E) The temporal operators describe how events are ordered with respect to time for a path specified by the path quantifier They have the following informai meanings: • F p (p holds sometime in the future) is true of a path iff there exists a state on the path that satisfies p • G p (p holds globally) is true for a path iff p is satisfied by all states on the path • Xp (p holds in the next state) is true iff p is satisfied in the next state of the path • p U (p until ф) is satisfied by a path iff ф is true in some state on the path, and p holds in all preceding states Bounded versions of the temporal operators exist They allow the expression of time-bounded properties, which can be used to verify the real-time behavior of systems Some examples of CTL formulas are given below to illustrate the expressiveness of the logic • AG(reg —> AF ack): it is always the case that if the signal req is high, then eventually ack will also be high • AG(re(? -> AF A [ send U recv]): itis always thecase that if send occurs, then eventually recv is true, and until that time, send must remain true 3 Algorithms for Minimum and Maximum Delay This section presents algorithms for computing minimum and maximum time delays between specified events All computations are performed on states reachable from a predefined set of initial states We also assume that the transition relation is total We consider the minimum delay algorithm first (figure 1) it returns the length of (i e number of edges in) a shortest path from a state in the state set start to a state in the state set final if no such path exists, the algorithm returns infinity Recall that the function T(S) gives the set of states that are successors of some state in S The function T, the state sets R and R', and the operations of intersection and union can all be easily implemented using BDDs The first algorithm is relatively straightforward intu-itively, the loop in the algorithm computes the set of States that are reachable from start if at any point, we encounter a state satisfying ша , we return the number of steps taken to reach that state The maximum delay algorithm returns the length of a longest path from a state in the state set start to a state in the state set final if there exists an infinite path beginning in a state in start that never reaches a state in final, the algorithm returns infinity The function T x (S') gives the set of states that are predecessors of some state in S' (i e T 1(5") = {s | A(s,s') holds for some s' e S"})- We also denote by notJfinal the set of all states that are not in final As before, the algorithm is implemented using BDDs, however, a backward search is required in this case 4 Condition Counting Algorithms in many situations we are interested not only in the length of a path from a set of starting states to a set of final states, but also in measures that depend on the number of states on the path that satisfy a given condition 73 proc minimum (start, final) i = 0; R =start; R! = Т(Я) U Я; while (R! RAR Ci final = 0) do i = i + 1; R = R'; R' = T(R’) U R'; if (R Ci final fi 0) then return i; else return oo; proc maximum (start, final) i = 0; R =TRUE; R' =not final; while (R' fi RAR1 Ci start fi 0) do i — i + 1; R=R'; R' = T i(Rr) П not-final; if(R=R') then return oo; else return i; Figure 1: Minimum and Maximum Delay Algorithms Both algorithms in this section take as input three sets of states: start, cond and final The algorithms compute the minimum and the maximum number of states that belong to cond, over all finite paths that begin with a state in start and terminate upon reaching^naZ To guarantee that the minimum (maximum) is well-defined, we assume that any path beginning in start must reach a state in final in a finite number of steps This can be checked using the maximum delay algorithm described in the previous section Finally, we ensure that all compu-tations involve only reachable states, by intersecting start with the set of reachable states computed a priori To keep track at each step of the number of states in cond that have been traversed, we define a new state-transition system, in which the states are pairs consisting of a state in the original system and a positive integer Thus, if the original state-transition graph has state set S, then the augmented state set will be Sa = S x iN - if N C S x S is the transition relation for the original state-transition graph, we define the augmented transition relation Na C Sa x Sa as Na({s,k),(s' ,k'}) — N(s,s') A (s1   cond A k' — +1V s'   cond A k' — k) in other words, there will be a transition from (s, k) to (s1, k') in the augmented transition relation Na iff there is a transition from s to s' in the original transition relation N and either s'   cond and к' = к + 1 or s' cond and k' = k We also define T as the function that returns the set of successors of all states in a given set U C Sa- More formally, T(U) — {u' | Va(iz,u') holds for some и   U} in the actual BDD-based implementation, an initial bound kmax can be selected to achieve a finite representation for k, and new BDD variables can be added dynamically if this bound is exceeded The system is still finite-state because all paths we consider are finite and к is bounded by their maximum length The algorithm for computing the minimum count is proc mincount (start, cond, final) current-min = oo; R = {(s, 1) | s   start (1 cond}U {(s, 0) | s E start П cond}; loop Reached-final = RCt Final; if Reached-final 0 then m = min{& | (s, к}   Reached-final}; if m AF GNT) AG(start transaction —> AF end transaction) The properties above show that the response time of PCi transactions is bounded, but they give no indication of their performance We will use the algorithms described in sections 3 and 4 to determine the response time for 75 Bus Master Arbitration Bus acquis Tot bus acquis Target Total trans iSA SCSi Video Proc Li, 95] Figure 5: Response times for global round-robin policy (times are [min,max]) transactions The results of our quantitative analysis also determine the correciness of the algorithm, for example, a transaction always finishes if its maximum response time is less than infinity in our performance analysis we will follow the structure of the protocol by computing the response time for each phase of the transaction separately in this way we can have a better understanding of the behavior of the protocol By computing the latency of each phase we are able to assert the efficiency of each step in the protocol and obtain the global behavior by adding individual figures Results will be grouped into two categories, total bus acquisition latency and total transaction latency The first category corresponds to the total time between a request being made on the bus and the subsystem actually being able to use the bus The second category represents the total usage of the bus, that is, the time between asserting the FRAME signal until the end of data transfer Table 5 shows the response times when the arbitration policy is set to round-robin in all banks and transaction cancelling is not allowed Notice that in all cases discussed in this paper the latency for the data transfer phase varies between 1 and 16 clock cycles, there is no overhead associated with it For that reason, this column will not be shown in the tables From the table above we can see two interesting properties of the system The total transaction latency is at most 18 clock cycles, and in this case 16 clock cycles of data are transmitted This means that once a master is able to use the bus, it can send data very efficiently Another characteris-tic of the protocol is reflected on the bus acquisition times The maximum of 18 cycles corresponds to one transaction After being granted the bus the new master may have to wait for at most one more transaction to complete This shows that once the bus is granted to a master, it will not be granted to another before the first one issues its transaction Therefore no starvation can occur after a master is granted the bus This property can be verifled by: AG(GNT -4- A[GNT U FRAME]) A more intriguing result can be seen in the arbitration latency results The first two subsystems can take almost twice as long to access the bus as the others in a round-robin environment, all subsystems should be granted equal usage of the resource, but this is not true in our example By analyzing the execution traces produced by our tools we are able to determine the reason for the unfair access to the bus The problem arises from the connection of the request lines to the arbiter as seen in figure ?? The iSA bridge and the SCSi controller are connected together to bank 0, while the video and the processor subsystems are alone in their banks if bus traffic is high, the iSA bridge and the SCSi subsystems may have to wait for one another before their request reaches bank 2 Subsequently they may have to wait for subsystems connected to the other banks to execute before being granted the bus in other words, they compete in both levels of arbitration, while the other subsystems only compete in the last level This causes the worst time latency to be approximately twice as long for these subsystems We can conclude from these results that two level arbitration may have a different behavior than an equivalent one level arbiter in this case the problem is caused by an asymmetric connection of request lines We can also use these results to analyze the overhead imposed by the communication protocol on the transaction time We have already seen that after asserting the FRAME signal there is an overhead of 2 clock cycles This overhead is independent of the transfer size if a transaction is allowed to transfer more than 16 cache lines of data at once, the total utilization of the bus will increase The designers of the bus can use this information to determine which is the best transfer size for a given system The following two formulas have been used to verify the above statements: AG(FRAME -4- AF D induces an equivalence relation on vectors in : for x,y Eff define x = y 332 1063-6404 97 s10 00 © 1997 iEEE iff а1 (ж) = а’(у)- The equivalence relation = partitions the 0-1 vectors into equivalence classes We choose a unique representative from each equivalence class and construct a representative function h‘ such that h‘(x) is the unique representative in the equivalence class of x From the initial abstraction function a we have thus generated a function Л1 : В1 B' if n is the length of x, then we write h(x) to denote hn(x) it is easy to see that h is idempotent, i e, h(h(x)) = h(x) Next, we define what it means for the abstraction function function h to be consistent Definition 1 An abstraction function h : B^ -" (J" BJ is consistent iff for all 1   z[h(x • z) = h(y • z)] For example, consider the abstraction function h : (JJ B-7 -> (JJ Bi induced by a linear abstraction function a‘,i = 0, • • •, n — 1 defined below: i a'(xq,      , xf) — 57 xi 3=0 where bj (0 В be an n-argument boolean function An abstraction function h : Bn -> Bn induces a transformation on boolean functions according to the following relation We denote the transformed function as fh and define it as follows: fh = foh Lemma 1 Let f,p,q : Bn В be boolean functions, © any logical operation, and h : Bn -" Bn be an abstraction function iff = p&q, then fh -Ph qh- Proof: Let     B" be an arbitrary vector We have the following equations: fh(x) - (foh)(x) = f(h(x)) = ?(*(*)) Ѳ q(h(x)) - Ph(x) © qh(x) The result follows □ Next, we show how the above results can be applied when representing boolean functions by binary decision graphs Definition 2 A levelized binary decision graph (levelized BDG) with n levels is a 7-tuple (V, left, right, level ,to,t ,root), where • V is the set of nodes • level : V {0, • • •, n} • left : (V-{to,M) Vistheleftchildfunctionwith restriction: level(v) = level(left(v)) — 1 • right : (Ѵ-{ о,й}) —> V is the right child function withrestriction: level(v) = level(right(v)) — 1 • h 6 V is the zero node with level(v) = n • ti   V is the one node with level(v) = n • root   V is the distinguished root node and level (root) = 0 • For all v   V-{root,to,t }, 1 V Letp   Вг bea 0-1 vector or path of length i node? (p) = v iff we get to node v by following the path p from the root Notice that a levelized BDG T corresponds to a boolean function b(T) : Bn В in the following manner: •b(T)((yi,-   • ,yn)) = 0iff nodeT((y ,-    ,yn)) =t0 •b(T)((y ,       ,yn)) = liffnodeT((t i, -,t n)) = tb Given an abstraction function h : Bn —> D, we show how to construct an abstract levelized BDG from a given levelized BDG Without loss of generality, assume we have cho-sen the representative a: of an equivalence class to be the lexi-cographically least element in that equivalence class Therefore, if i- Proof: Let pi and pz be two paths of length i such that  i(pi) = in the levelized BDG Д, we have nodeph{p ) — nodeTh(pff Thus, if two paths pi and pz agree on the abstraction value, then they lead to the same node Hence, at each level the number of nodes in the levelized BDG Th is bounded by the size of the range of h □ Let © be an arbitrary operation on boolean functions The lemma given below states that abstraction of levelized BDGs can be performed compositionally Lemma 4 Assume that we have three levelized BDGs T, T  andT2 ifb(T) = b(Tl) © 6(T2), then we have the following equation: b(Th) = Ь(П)&Ь(Т2) Proof: The proof follows from the following equations: b(Th) = b(T)h (By Lemma 2) = b(T1)h © b(T2)h (By Lemma 1) = © bfa) (By Lemma 2) Figure 2 BDD and levelized BDD for (xo V an) Levelized BDDs are obtained from levelized BDGs in the following manner: Given a levelized BDG T, we merge two nodes v and v' (whose level is the same) iff the subtrees rooted at them are isomorphic Reduced ordered BDDs, on the other hand, add an extra level of optimization because redundant nodes are removed, as described in For example, Figure 2 gives the BDD for the function ( o V x i) and the corresponding levelized BDD Because of the merging of isomorphic subtrees, we must modify algorithm DFS We caii our new algorithm BDD-DFS and it is described in Figure 3 in the algorithmgiven in Figure 3, Sub(v) denotes the subtree rooted at the node v Sub(v) w Sub(v') means that the trees rooted at v and v' are isomorphic Given a levelized BDD T, the BDD Th obtained from algorithm BDDJDFS is called an abstract BDD or aBDD Notice that the aBDD obtained in this manner is also levelized The definition of aBDDs and the properties proved in this chapter can be easily generalized for different abstraction functions used at each level 334 ditions M(f) is a canonical representation for f function BDD-DFS(v,path) p' = h(path)  itp'   path v' = lookup-cache(p'); return v'; else if nonterminal(v) leftfy) = DFS(left(v),path • (0)); right(v) = DFS(right(y), path • (1)); if there exists vi in cache such that Sub(v) и Sub(yt) retum vi; endif; endif; insert cache{p', v); retum v; endif Figure 3 Modified DFS ol levelized BDDs 3 Uniqueness of Representation Assume that we have two boolean functions f and g and let Tt and Tg be the levelized BDDs for   and g Given an abstraction function h, h(Tj)   h(Tg) implies that f g, but h(Tf) = h(Tg) does not necessarily imply that f = g in other words, aBDDs are not canonical in this section we prove that with some restrictions we can obtain the canon-icity property for a group of functions A set of abstraction functions { "i, • • •, Лр}, where Л,- : Bn —> Bn for 1 Bn be an abstraction function Given a function f : Bn -> Bn, we represent it by a vector of n boolean functions ( i, •   •, fn) Assume that we are given a function f and a set of p abstraction functions {Лі, • • •, hp} where hi : Bn Bn Let ( i > • >  n) be the array of boolean functions corresponding to f Let be the levelized BDD corresponding to the boolean function o hj Let M ( ) be a m x p ma-trix of aBDDs such that M(f)ij = T’J The matrix is schematically shown below: j4,l t2,  T^P tm,} The theorem given below proves that under certain con- Theorem 1 Assume that f : Bn Bn and g : Bn -v Bn are two functions Let (fi, • • •,  n) and ( Bn (1 Bn which multiples two integers with у bits (we are assuming no overflow) For a vector x = ( o,   • • > я"-і) G we define val (a?) = • The function mult is defined by the following equation: val(mult(x)) = val(xo, •     ,xt-i) * val(xn,-    ,xn) Assume that we have m relatively prime positive integers Pi, • • -,Pm suchthatpi p2• • -Pm > 2" Let г,- : Bn —> Bn be the abstraction function corresponding to taking the residue with respect to p,- By the Chinese remainder theorem, the set of abstraction functions { Л i, • • •, hm} preserves the domain Bn Moreover, hi o f = hi o f o hi because (a:*y)modpt- = ((z mod pt) * (y mod p,)) mod p,-for any positive integer p (* denotes the multiplication of integers) Translated into our notation the equation given above becomes h{ o mult = hi o mult o hi Therefore, mult satisfies the condition in the hypothesis and the theorem applies to this case More generally, this theorem will be also true when f satisfies hi o f = f o hi 4 Equivalence checking using aBDDs Because of their bounded size, aBDDs can be used to verify the equivalence of large circuits in general, residue functions are good abstractions for arithmetic circuits Sym-metric and linear functions tend to work better for control 335 logic if the circuit has symmetric inputs, a symmetric abstraction function should definitely be used These con-clusions are supported by the experimental data in section 5 The overall procedure is as follows 1 Given a circuit, choose a set of appropriate abstraction functions 2 Select an abstraction function h out of a set of abstraction functions This set will be provided based on the nature of the circuit 3 Build aBDDs for the specification and the implementation circuit using the abstraction function h 4 Compare the two aBDDs that are obtained for specification and implementation if they are different, an error is detected Otherwise, choose a different abstraction function from the set and repeat step 3 with a different abstraction function in general, there is no procedure to select a set of abstraction functions that will detect all errors in a circuit Nevertheless, we believe that our methodology can be ex-tremely useful in practice, since an initial design is much more likely to eontain errors than to be correct Next, we give a description of our algorithm Since our algorithm to build an aBDD assumes that we are working with a levelized BDD, we have to levelize a BDD before we apply our abstraction algorithm For example, assume that f = p Л q and that we have already built the aBDDs for p and q (with respect to the abstraction function Л) Let us caii these aBDDs h(Tp) and h(Tq) Next, we build the BDD corresponding to h(Tp) Л h(Tg) Finally we levelize the BDD and apply our abstraction algorithm to obtain the aBDD for f We use a simple example to illustrate the algorithm Assume that we have an abstraction function h for the circuit in Figure 4 The aBDD associated with z is h(T2) At the beginning, assume that we have aBDDs for the inputs a, b, c, d By performing the nand operation, we form the BDD77 = -i[h(T0) ЛЛ(Ть)] Next weperform abstraction on the levelized BDD of Te and obtain the aBDD  i(Te) The same procedure is performed on f After we obtain aBDDs for e and f, we compute the aBDD for the output g by using the same method 5 Experimental Results straction function i а‘( Г0, •••,";) — Xj, i — 0, •   -n — 1 1=0 and the second one is the residue function i а*(жо, • • •, хі) — V Xj mod n, i = 0, • •   n — 1 l=o in Table 1, Det Errs is the number of faults detected by these three methods, and Max # Nodes is the maximum number of BDD nodes that need to be held in memory, which is usually much larger than the final BDD size Avg Time is the average time to detect a design error The OBDD results for c2670, c5315, c6288 and c7552 are not reported because they exceeded the memory limit The experimental results show that using aBDDs it is possible to detect a high percentage of design errors (between 40% and 90% of the errors are detected using symmetric abstraction functions alone The reduction in BDD size for some circuits is over two orders of magnitude Since the size of aBDDs is bounded, this reduction will be much more significant in real industrial circuits For most of the circuits, residue abstraction functions do not detect as many errors as symmetric abstraction functions We choose the number of variables as a modulus because both symmetric and residue abstraction functions produce BDDs of similar size The results support our argument that residue functions may not be a good abstraction for control circuits 6 Conclusion We have implemented our algorithm in C Our experimente were performed on a Sun SPARC 10 workstation with 200 Mbytes of memory The experimente were performed on the iSCAS’85 benchmark circuits Faults in the circuit were injected one by one by selecting a stuck-at fault on one input of an arbitrary gate Table 1 compares our method with ordinary BDD equivalence checking in the table, two abstraction functions are used One is the symmetric ab- in this paper, we introduce a general framework for ap-plying abstraction to BDDs Abstract BDDs (aBDDs) are of bounded size and can be constructed directly from the circuit, without first generating the original BDDs This technique makes it possible to show inequivalence of combi-national circuits if the aBDDs for two circuits are different, then the circuits correspond to two different boolean functions On the other hand, if the two aBDDs are identical, the 336 circuits Errs Det Errs Max # Nodes Avg Time OBDD Symm Resid OBDD Symm Resid OBDD Symm Resid c432 50 50 50 33 4712 4604 3902 1 15 7 94 19 70 c499 50 50 40 28 95745 9481 27121 22 74 16 72 48 64 c880 50 50 28 7 637338 7705 4999 138 25 58 17 180 56 C1355 50 50 40 28 96357 9497 27476 25 49 44 93 129 48 C1908 50 48 40 36 70196 6274 15838 35 95 22 82 61 86 c2670 10 unable 5 2 — 132593 774009 — 5449 37 5073 17 c3540 50 50 24 16 1522988 9927 8267 299 89 109 61 379 06 c5315 10 unable 10 3 — 208795 234716 — 4618 01 10052 3 c6288 10 unable 6 6 — 7317 38 — 86 52 61 20 c7552 10 unable 9 10 - 366462 2301523 - 11963 65 18405 8 Table 1 Comparison of equivalence checking using OBDDs and aBDDs circuits may still be different in spite of this lack of com-pletess, experimental results show that the technique is able to find a suprisingly large number of errors in practice This is important because circuits tend to contain errors much more frequently than they are correct Moreover, we iden-tify an important class of functions for which this technique is complete This class includes many common arithmetic circuits including integer multiplication We are currently investigating probabilistic techniques for estimating the er-ror coverage obtained using this method References Randal E Bryant, "Graph-Based Algorithms for Boolean Function Manipulation", iEEE Trans on Comput , Vol C-35, No 8, pp 677-691, Aug 1986 Karl S Brace, Richard L Rudell, Randal E Bryant, "Efficient implementation of a BDD Package", 27th Design Automation Conference, pp 40-45,1990 Edmund M Clarke, Orna Grumberg, David E Long, "Model Checking and Abstraction", ACM Transactions on Programming Languages and System, Vol 16, No 5, pp 1512-1542, Sept 1994 Shinji Kimura, "Residue BDD and its Application to the Verification of Arithmetic Circuits", 32nd Design Automation Conference, 1995 Hiroyuki Ochi, Koichi Yasuoka, Shuzo Yajima, "Breadth-First Manipulation of Very Large Binary-Decision Diagrams", Proc intl Conf Comput Aided Design, pp 48-55,1993 Kavita Ravi, Abelardo Pardo, Gary D Hachtel, Fabio Somenzi, "Modular Verification of Multipliers", Formal Methods in Computer-Aided Design, pp 49-63, Nov 1996 Richard Rudell, "Dynamic Variable Ordering for Or-dered Binary Decision Diagrams", Proc intl Conf Comput AidedDesign, pp 42-47,1993 337 Finding Errors in Python Programs Using Dynamic Symbolic Execution Samir Sapra1, Marius Minea2, Sagar Chaki1, Arie Gurfinkel1, and Edmund M Clarke1 1 Carnegie Mellon University, Pittsburgh, PA, USA ** 2 Politehnica University of Timisoara, Romania Abstract For statically typed languages, dynamic symbolic execution (also called concolic testing) is a mature approach to automated test generation However, extending it to dynamic languages presents several challenges Complex semantics, fragmented and incomplete type information, and calls to foreign functions lacking precise models make symbolic execution difficult We propose a symbolic execution approach that mixes concrete and symbolic values and incrementally solves path constraints in search for alternate executions by lazily instantiating ax-iomatizations for called functions as needed We present the symbolic execution model underlying this approach and illustrate the workings of our prototype concolic testing tool on an actual Python software package 1 introduction Dynamic symbolic execution (DSE) has been very successful for generating tests and finding errors it accumulates path constraints over symbolic inputs rather than executing with concrete values Java Pathfinder , Pex , or KLEE try all possible program paths, using full symbolic models also for environment inter-actions Concolic testing, a variant used in DART , CUTE and Crest , generates symbolic constraints guided by concrete executions, and then modi-fies them to explore alternate paths Approximating some execution fragments through concretization proves useful even in the absence of complete models Compared to existing work in the context of static typing, symbolic execution for Python as dynamically typed language raises a series of new challenges: i) The language complexity makes symbolic execution difficult: First, a more complex theory is needed to express path conditions precisely Python objects have dictionaries of attributes and hence one needs to handle strings and maps Dictionary keys can be added dynamically and can be arbitrary hashables, not just integers or strings A variety of runtime errors and exceptions related to dynamic features are handled in different ways ?? This material is based upon work funded and supported by the Department of Defense under Contract No FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering institute, a federally funded research and development center NO WARRANTY THiS CARNEGiE MELLON UNiVERSiTY AND SOFTWARE ENGiNEERiNG iNSTiTUTE MATERiAL iS FURNiSHED ON AN "AS-iS" BASiS CARNEGiE MELLON UNiVERSiTY MAKES NO WARRANTiES OF ANY KiND, EiTHER EXPRESSED OR iMPLiED, AS TO ANY MATTER iNCLUDiNG, BUT NOT LiMiTED TO, WARRANTY OF FiinESS FOR PURPOSE OR MERCHANTABiLiTY, EXCLUSiViTY, OR RESULTS OBTAiNED FROM USE OF THE MATERiAL CARNEGiE MELLON UNiVERSiTY DOES NOT MAKE ANY WARRANTY OF ANY KiND WiTH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRiGHT iNFRiNGEMENT This material has been approved for public release and unlimited distribution DM-0000479 2 S Sapra, M Minea, S Chaki, A Gurfinkel, E M Clarke Moreover, Python is often used to glue together components in other languages, for which we may not have models Library functions are often in native code, thus values become concretized during execution and can no longer be tracked symbolically This work avoids the cost and complexity of eager full symbolic execution by using path constraints that mix concrete and symbolic values These constraints are solved incrementally in a search for satisfying program inputs, lazily instantiating axiomatized models of executed functions as they are needed ii) Type information for objects is incomplete and fragmented Type constraints are implicitly accumulated from successful runtime checks (objects must have the accessed attributes, be iterable, callable, etc ) An object’s type may not be completely known: x[l] could be indexing a list, tuple, string, dictionary, or user-defined type This complicates formalizing and tracking type constraints and also means a program can hide many more bugs Since many conditions can be flipped to explore alternate types and program paths, it is crucial to steer this search efficiently and avoid exploring uninteresting execution paths This work selects relevant conditions based on data dependencies, and uses the solver output (unsatisfiable core) to direct the choice of alternate paths A Motivating Example Version 0 93 of dnuos (https:  bitheap org dnuos ) - which creates collections of audio files - crashes on an empty directory The bug is in function uniq (line 2 in Fig 1) - accessing a list without a non-emptiness check A faulty run has uniq called from types (1 7) on a list created with map (1 6) from method streams, which filters (1 10) a list returned by children The latter iterates (1 13) over a list produced by os listdir for the input pathname uniq(list): list = [ list ] reduce( A,x: x А А A+[x] , list) types(seif ) : seif types != None: seif types types = map ( x: x typeO, seif streams () ) seif types = uniq(types) streams(seif): list = filter( is audio file , seif children ()) children(self): seif children: seif children seif children = map( x: os path join(seif path,x), os listdir(seif path)) seif children Fig 1 Code fragmente from dnuos for Processing a list of audio files The challenges are: (i) to detect such errors automatically, i e , finding a buggy path starting from a successful run (here, on a non-empty directory); and (ii) applying DSE in coverage mode to detect as many errors as possible We first describe in Sec 2 the architecture of our concolic testing engine and the systematic search for alternate execution paths by lazily instantiating the needed constraints Sec 3 then briefly illustrates key aspects of i) the formalism used to represent Python path conditions, ii) the symbolic bytecode semantics, and iii) the axiomatization of library functions Finally in Sec 4, we show how these are tied together in our prototype CutiePy by revisiting the above example Finding Errors in Python Programs Using Dynamic Symbolic Execution 3 2 An Architecture for Concolic Testing Dynamic symbolic execution is driven by a concrete execution with some initial inputs A symbolic execution engine is run in lockstep, working with symbolic constraints over program variables, rather than concrete values Given formal semantics for every instruction, these constraints can be accumulated in a path condition, which includes all branch conditions taken on the path, and charac-terizes all inputs for which the program will take the same path To explore a new path, a branch condition is flipped and, together with the path condition leading to it, is passed to a solver which returns inputs to exercise the new path Our symbolic execution framework is distinguished by how constraints are expressed and collected for each instruction, and how branches are flipped We describe the former in Sec 3, including what to do when fully symbolic execution is not feasible Here we outline how to find new executions (Algorithm 1) A path condition is a list of clauses that are either definitions or conditions Definitions have the form v = f (s1, , sk) with v a variable and Sj constants or variables Conditions can be explicit (program branch or loop conditions) or implicit, denoting statement execution without error (e g , predicates hasattr, iscallable, isiterable) Program and library functions can be interpreted (fully formalized, cf Sec 3) or uninterpreted, if there is no complete model for them Given a path condition, the dependence set Dep(C) of a clause is defined as: - for a condition, Dep(C) is the set of clauses that share variables with C - for a definition v = f (s1, , sk), Dep(C) is the set of all conditions that contain variables from C , plus any definitions of variables in the right-hand side of C Define Dep+(C) transitively as the smallest set such that Dep(C) C Dep + (C) and if C 2 Dep + (C) is an interpreted clause, then Dep(C0) C Dep + (C) Let i be the set of program inputs and FV (C) be the set of variables in the set of clauses C that do not appear on the left-hand side of a definition Algorithm 1 Selection of alternate execution paths 1: function FLiP([ri, r2, , rk], flipped) 2: Ф flipped U Dep + (flipped) 3: while sat(^) do 4: if FV(Ф) C i then return sat assignment 5: else strengthen Ф with lemmas for FV(Ф)   i 6: u maxfi | i C,o" ::S e' symbolic typeof (o') == PyList o non-negative C, o::o' ::S —> C, o"::S e' symbolic F, e:: e':: S => F, Select (ez, e):: F F, e:: e':: S => F, o":: F assert typeof e = Pylnt A typeof e' = PyList assert hasattr 3efe;tem e' A hasattr 3et,;tem e' A lenof e' > e + 1 Both rules track the indexed collection e' symbolically in rule 1, o' has the actual type PyList, and successful execution implies that o is an integer, which allows us to derive size constraints and track the result of subscripting symbolically in rule 2, we assert (append to the path condition) the only constraint we learn, hasattr 3efe;tem e', and we push the concrete result o" = o'[o], onto F Concretization when calling functions that lack models (e g , native code) is one of the main obstacles to building symbolic path conditions On return from such a function, we re-introduce a symbolic variable for the result This helps track data flow in spite of concretization Treating the function as uninterpreted also allows us to lazily ignore conditions which are irrelevant for the testing goal For mutable objects, we exploit the Python bytecode interpreter to introduce the additional indirection needed to update referrers For dictionaries, we model only fields that are updated with symbolic values in both cases, we limit symbolic modeling to items that are strictly needed 4 Case Study and Conclusions We explain a run of our tool CutiePy on the example of Fig 1 The unit test executes the call audiodir Dir(i f ilename) typesO ОП input i filename= , in an enviromnent where the directory  dummy contains a single, valid music file CutiePy produces a path condition with   400 constraints (after instantiating partial interpretations) Of particular interest is generating a test that exercises the unchecked list access on line 2, which is compiled to bytecode BiNARY SUBSCR Since that list is produced by the built-in function map (Fig 1, 1 6) written in C, we need a partial interpretation to reason about it To achieve this, CutiePy replaces the standard map with a workable model given by the Python function: audiodir builtins [ ] = f,L: ff(x) x L] Such models are inserted up-front and are present at what we designate as the ‘first’ call to FLiP Thus, the path condition fragment Ф sent to Z3 by FLiP at line 3 is: p385 == Not(lenof(vl48) >= 1) flipped constraint p384 == (typeof(vl48) == PyList) p377 == (vl41, ѴІ48) == LiST APPEND(vl43, v!47) call to APPEND p364 == (lenof(vl43) == 0) execution of map model p363 == (typeof(vl43) == PyList) vl43 empty initial list 6 S Sapra, M Minea, S Chaki, A Gurfinkel, E M Clarke Combined with the APPEND axiom of Sec 3, Z3 finds these clauses inconsistent (unsatisfiable core: [p385,p377,p364,p363]) To break the unsat core, keeping a maximal execution prefix, lines 6-7 identify the condition Not(exhausted(vl44)) as candidate for flipped intuitively, we want to flip the branch just prior to list append (which contradicts our goal of a zero-length list) CutiePy has rea-soned about 1 6 in Fig 1 at bytecode level, concluding that map must return a list whose iterator (vl44) will be immediately exhausted, i e , an empty list in two more recursive calls, FLiP propagates constraints to primary inputs, but not with the intended bug-revealing trace in the first caii, line 7 of FLiP identifies r-j = (exhausted (nexted once v29)) as the next candidate for flipped CutiePy has determined through bytecode-level reasoning that list should have > 2 elements in the next recursive caii to FLiP, !)ep finally reaches constraints on the primary input i filename, pending models for os path join and os listdir Неге Ф is found unsat, and in line 7 rj=ru = Not(exhausted(v29)); however, by this point execution has already diverged from our intended one When the right r-j is picked in line 7, CutiePy will discover the bug in uniq: exhausted(vl44) requires filter returning an empty list (again an immediately exhausted iterator), which in turn necessitates map returning an empty list in 1 13, which via a partial axiomatization of os path listdir and appropriately set up environment leads to the new primary input i filename= When forcing execution to a particular point, flipping the right conditions impacts efficiency A promising heuristic is to focus on loop conditions Our initial experiments have shown that for the dynamic features of Python a key challenge is tracking the right amount of symbolic information during execution We show how to do this by lazily constructing and solving constraints, and using complete or partial axiomatizations of library functions as needed Further evaluation will provide insight into the amount and types of bugs that can be automatically found, and how to tune the framework to effectively and efficiently zoom in on the most representative and relevant errors References 1 Burnim, J , Sen, K : Heuristics for scalable dynamic test generation in: 23rd international Conference on Automated Software Engineering pp 443-446 ACM (2008) 2 Cadar, C , Dunbar, D , Engler, D : KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programe in: 8th OSDi USENiX (2008) 3 Godefroid, P , Klarlund, N , Sen, K : DART: directed automated random testing in: Programming Language Design and implementation pp 213-223 ACM (2005) 4 de Moura, L , Bjprner, N : Z3: An efflcient SMT solver in: Ramakrishnan, C R , Rehof, J (eds ) TACAS LNCS, voi 4963, pp 337-340 Springer, Heidelberg (2008) 5 Pasareanu, C S , Rungta, N , Visser, W : Symbolic execution with mixed concrete-symbolic solving in: 20th iSSTA pp 34-44 ACM (2011) 6 Sen, K , Marinov, D , Agha, G : CUTE: a concolic unit testing engine for C in: 10th ESEC 13th SiGSOFT FSE pp 263-272 ACM (2005) 7 Tillmann, N , de Halleux, J : Pex—white box test generation for NET in: Beckert, B , Hahnle, R (eds ) ТАР 2008 LNCS, voi 4966, pp 134-153 Springer, Heidelberg (2008) industria nevazuta Marius TiVADAR - malware researcher @ Bitdefender inceputuri - DOS - 80s • Vremurile boeme — Creatori inocenti • Demonstrarea abilitatilor • Ca forma de protest • Fame and glory — Brain 1986 — Michelangelo 1991 • La fiecare 6 martie, un omagiu adus artistului renascentist - One Half 1994 Virusul Brain • Considerat primul virus de PC, 1986 • Originar din Pakistan, scris de fratii Basit si Amjad • Virus de boot, infecta floppy-disk-urile Morris worm 1988, primul virus care se raspandea prin internet - Lansat de la MiT de catre Robert Morris A fost facut in scop demonstrativ Exploata vulnerabilitati in diferite servere - Sendmail, finger, rsh Primele pagube, prima condamnare - s10,000 amenda - 3 ani cu suspendare - 400 de ore de munca in folosul comunitatii   цИ]1 * xq мн s 1ХЭ Mi ІЭИГІМ пн Ю 'іш М* *" ШІ Гми *13 мн i" 'ИІ1М1И ми НІ1МІЙ ‘it Mi auriii^ HajiibJ1 9W:|||^1 ИЛ1ІІІЫ' Вл:иіі*і iHililbJ' маліЬ^ ВДПІ іЫ&Вs un*|5tl i" Mfiipt i) M0WJ) ПИ qp •ii it" и "jtmi (Hrirhti) мі шт *wift :u   qp MMi Will WW1W N3011 NVMN iU'  W S33iM1S iMM"H NiVM Pil jr е ЗвП"гs gtartec !x - Oct 4, rqlll:KH ; >rqlll:BHi >rqlll:tHB >'qlH:tH  >rqlll:t?12 > rqlll:t?66 >rqlll:t?tt Й N l"it i tapd (pt) Ui МАІИ COmiTER SERViCES* db ‘ Hi N17AM iLKM MLMft l(Mi iMN db   Mi: І1Ш91, U32M, йг ifw* if tfeli ’MrvT |t' db * vili trMifvr ti Ш1ІМ >1 кЙЙ  і m Ah bf tf 17ІМ RH byttjni" *1 mu ci" wrd 1?ll? mu nord Wlli, ci cili ІЙ 1ПСІ mv Cit s mu Mit TtUiii Malware • Virusi - Au nevoie de o gazda • Worms - Se propaga singur de la un sistem la altul • Trojan — Program malitios stand-alone, nu infecteaza alte sisteme Virus Entry Point Entry Point Virus - polimorfism Viral code Generatlon  +1 Viral code Generation  +2 Windows • A aparut Windows 95, 98 — Hackerii inca invatau sistemul — in scurt timp a aparut primul infector de PE-uri • Aparitia virusilor polimorfici metamorfici — Existau si in era DOS, dar abia acum tehnica a fost dusa la rangul de "arta" • Evol • metaPHOR 2002 Windows worms , 2000 — Raspandire prin   (Outlook) — Scris in VBS, usor modificabil J45 milioane sisteme infectate — Pagube estimate: milioane s, fara scop — Autorii nu au patit nimic, in Philippine nu existau legi pentru criminalitate informatica Blaster, 2003 • Se raspandea pe sistemele XP • Se folosea de o vulnerabilitate: MS03-026 • Ca si efecte, ataca site-ul de Windows Update (DDOS) • "Billy Gates why do you make this possible ? Stop making money and fix your software!!" Virusi vs Anti-Virusi     Boinets • Cineva a avut ideea de a tine evidenta r r calculatoarelor infectate, asa s-au creat retele de "bots" • La inceput se folosea iRC, toate victimele se conectau la un server prin acest protocol de chat — Avand control asupra sistemelor, cineva s-a gandit sa si faca bani din asta r Boinet in 2000 Boinets • Au inceput sa devina industrie prin 2000 • Folositi la SPAM   • Peste 60% din  -uri sunt spam EarthLink wins s25 million lawsuit against junk  er Boinets • Au aparut probleme, C&C-urile erau prea repede oprite de catre autoritati • Solutia: Storm boinet, 2007 - Retea descentralizata (cu ce seamana?) - Protocol criptat Boinet p2p master СЖС server" ГТѴГ ’ -* jr е ЗвП"гs gtxtec !x - Oct 4, піs no di oreckt or on the output of > г mverne ir ^ m dioi-nise ofo'ncrv lioip Wv oxlvnd t°g to functions that match sever"1 госнПо Ьп alina verificatfog bas ісох without pre-computed dixrfoensict ^nifid- nepeeting tpe iisib' eont de"d cl"lnte iitlnn the adversary presenzv (sdetectedend the pгo|bl o|eiovpe В : m A^d (Z-B-s A: Ehim} Adv can send В ; -,  ►• і no rtiuofm Adv Ь1О?(►) по Observation, off-lineor non- Alocking conirol pro diioc no differ—etjnotocol behavior, thus guesdoueov go im loi ppoc| O0 eef ЛАс ottuOsi0 mern that Adv observes at least b distinct indejoendent outputsofO^(-) (respectively, observes outputs of O^(-) for b chosecmpute) over diffeecnt pr^^а^^ліб ^шкП on these relations are deduced from the protocol description if A —> В : H(Na), then since Na is r;ui utt, r inltilior boCwton them For (1), knoying Es(idA m), Adv can guess s by checking for the knoyn value іИд in the input (obtained by invpoting pn onlpip iin to ptcip,pi kai pifh sil a) i'ptott if Adv knows Es(m, mS, ht can ohook lrii r|ipfi calino ofetherosuh oldocrcpl ion (inversion) has idoipiral ipplvts Sa rise|s>p belweop [yrlp s>p tos origma mput) For (3), knowing Е^Н^тЦ anh Adv mvsrts bntis oupputsand chinkr if the two inputs are rekted py meansof H This way of veripyiyh tic gupss ks validonly i S PSt snnetscol f e 0 there exists kc such that V7r > kc ,v(k) } tht to fmd relations between oearle pPsssvaiions cr subtasmt thersof Definition 3 Giveno funetion heod siKnt of tonus n, we say l hore is n ^e^laiion under h with argumenbyfrom ccdsnotrdP(ffi ai,lh ose advrtsaon emis estaPhsh an equality h( 5) = y such thnti i)  t, aae trsms eonsttsett) Orom the tdvorsary knowledge and twodirjointchbsnSe oi terms fromci, eSti al im^Sos^e suleet non-empty; ii) h( 5) (sinjuhOive in at least onr inonS ShaS eomes frumct, withait other inputs kept coiisUuii This relation is used r iahe gsrosing lesiHsa Cendition ii sc)css -Wa 'o ryilidiUe a guess by using at icast one termdeduuhCafher the chessi o'hile eaudil iau ii) avoids trivial identiCier whhh termstCut ren sssult frtmawronh guasi Lemma 1 (Guessme binoma) Let s C ГОД}: )e a low^nti^u ys^^ e^^t0 ^ , 2fc eomputation stejisarefeasiWe), au q-2k U ii) if Adv ◄ b1OEf (•) , q ,, 21^ s b2 > q-2k 64 В Groza and M Minea iii) if Adv ◄ b1OEf (ai), with distinct cp, Adv ctl^-fO^A-) ( ), O7^ )}, and R(h,a), with a = (op, , on), then Adv can guess s with q observations of OE^s’' > (•) and q-2k queries to (?%> ) (•), Oh(-) Adv (ai) Л Adv ctlb2{ODf (-),Oh(-)} Л R(h, a) Adv s bi > q b2 > q-2k (4) Proof sketch in case i) by Def 2, for q observations of O?(s, •), in average only one s verihes the input-output relation, so b  > q snffices Thus, Adv t(s, •) )Ls ,chn chooos all inpuSo, the) obtesvatfoht become queries), theii Oic nsn dopre-oamputeh eo f, Example 3 Let fibedotermirnetis qncryptien aoq H a icash functiom Thee, Ek(-), H(-), and E (a) aae injective, anethuo s^ncnGVre^"d^(m^^^^^^^ h s Adv - ctlj,2C>-l(-,  ) bi> t- Adv pw b2 > Aw2k according to case C—oi lho guessinv lemmawhich allows us to tosmolizn l iso attack originally presenOed in [t0| andexplai— it usier а ^очяіѵогi gue—stog rnlr 4 2 The Norwegian ATM With our calculus we formalize attacks in n W>rwegian ATM system, shown to be flawed in The system attempts to incrensA password security by liidiiig the verifier Cards sCane llie PlN sucrypiicl with abank loeyBK, tra-caC—d So 16 bits: [DESBictBieeia^ (rimplified, since the PiN is noi ensrypleddirestly) To find the PiN of a stolen card, Adv cannot guess the PiN off-line without t’K since for each PiN about 240 of 256 DES keys match However, presents a more subtle attack Adv gets severalhonest cards from the same bank Each known PiN reduces the number of candidate keys by a factor of 216 On average, 4 honest cards sudice to pgd  Pad gnd thcn ^serstheP'N on Hie—tolau o ATM to e ossrare—ummssised below: Card issuing stageo Bemk—s User:)DESBirS^-lSfi)J 16i-TCO PiN change procedure: User —-ATM o [РЕ5 вх(Р0еГоdE16 aPee—i^PSAn, ATM—t Uset : [DhVVBpi[PiNnewS  16 We assume PiN as— o:i(gllla>sliiia DESe)sl l'iNd bj vre Nard securely, otherwise a Dolev-Yeo atoenea ty could >at i ir PiN disee^feom l he protoaoL Since log2 |PPV| о 1 ion отaclv То tind l la' PiN, Adv must simuertrtSe oeael^ghhntelf, aiidioel leit TiD i'iiisiioo it—aorg Adv knows [DESbk[PiN-СѴ} 1& knows bk Adv^1OiksSBKiws E,5(T(x)J ig Then,   is strongly distinguishing in 4 queries since the DES key has 56 bits, and we have: Adv knows PiN CVr 4, [БЕЗвк^РШ СѴг і')]^ Adv of our approach an to td eni oi iy ioi in reaiekerr forsheioaaloelrlo соі тпгос References 1 Ding, Y , Horster, P : Undetectable on-line password guessing attacks Operating Systems Review 2с^Гѵ0 cc’^f (19 5) 2 Lowe, G : Analysing protocols subject to guessing attacks Journal of Computer Security 12(1), 83-98 (2004) 3 Corin, R , Malladi, S^ Mves-Eooo, ( , ERi-le S iGuess crlC kYiEukl E o new torl that finds some new guessing attacks in: Proc Workshop on issues in the Theory of Security, pp 62-71 (2003) A Calculus to Detect Guessing Attacks 67 4 Delaune, S , Jacquemard, F : A theory of dictionary attacks and its complexity in: Proc 17th iEEE Computer Security Foundations Workshop, pp 2-15 (2004) 5 Drielsma, P H , Modersheim, S , Vigand, L : A formalization of off-line guessing for security protocol analysis in: Baader, F , Voronkov, A (eds ) LPAR 2004 LNCS (LNAi), voi 3452, pp 363-379 Springer, Heidelberg (2005) 6 Corin, R , Doumen, J M , Etalle, S : Analysing password protocol security against off-line dictionary attacks in: Proc 2nd int’l Workshop on Security issues with Petri Nets and other Computational Models (WiSP), pp 47-63 (2004) 7 Abadi, M , Baudet, M , Wsrinschi, iSi Guessing il acks and theoamputatioual soundness of static equivalenca imAceto, i ingolfedottir, A ieds ) FOSSACS 2006 LNCS, voi 392i, pp 398-412 Springar, Heidelberg (2006) 8 Baudet, M : Decidingsccurity ot proiocols ugainstoff-Ssne gubtsing attaoks du: Proc 12th ACM Con!', oa Computer and Communications Saani^;j> 16-25 (2005) 9 Blanchet, B : An liHC-f-el SirypSographic Protocol V-sifierBrsaO on Fcolog Rulco in: 14th iEEE Comp-tsr Security FouadaCCons Woslesliop pp 12 FF І2-01) 10 Anderson, R J , Lunos T M A : Fortifyiss kFF negoAiatton schemes wiih poorly chosen passwords ElecSooniss Letters 30 1S), ilSOiil (M i (1 ^S^^) 11 Hole, K J , Moen, V Kiingoli 0 Each transition represents one time unit Aii computations are performeri on states reachable from a predefined set of initial states The algorithms described in this work are imple-mented using symbolic model checking techniques Boolean formulas can be constructed from the propo-sitional variables of the model A formula is said to be satisfied in a state if and only if the assignment of vari-able values in the state to the corresponding variables in the formula makes it true in general, a formula can be satisfied in many states, and we identify a formula with the set of states that satisfy it The transition relation can also be represented by a boolean formula constructed from two copies of the propositional variables one for the current state and one for the next state There is a transition from state v to state v' if the assignment of the variable values in state v to the current state variables, and the assignment of the variable values in state v1 to the next state variables satisfy the formula Our algorithms work on boolean formulas repre-senting sets of states For example, the formula rep-resenting T(S) — {s' | Ar(s, s') holds for some s G S}, the set of all successors of states in a state set S, can be easily constructed from the formula for S and the formula for the transition relation in one step, regard-less of the number of states in S and T(S) The fact that all operations consider sets of states instead of individual states is one of the main reasons for the effi-ciency of our method Moreover, boolean formulas are implemented by binary decision diagrams (BDDs) , enabling the use of efficient algorithms for their ma-nipulation We consider the minimum delay algorithm first (fig-ure 2) The algorithm takes two sets of states as input, start and final it returns the length of (i e number of edges in) a shortest path from a state in start to a state in final if no such path exists, the algorithm returns infinity Recall that the function T(S) gives the set of states that are successors of some state in S The function T, the state sets R and R', and the operations of intersection and union can all be easily implemented using BDDs proc minimum (start, final) i= 0; R =start; R! = T(R) U R; while (R' R  RC  final = 0) do 1 = i + 1; R = R'; R' = T(R')uR'; if (R Г) final 0) then return i; else return oo; proc maximum (start, final) 2 = 0; R= E; R' = not-final: while (R' R Л R' П start 0) do 2 = г + 1; R= R'; R! = T X(R') A not-final; if (R = R') then return oo; else return i; Figure 2: Minimum and Maximum Delay Algorithms The first algorithm is relatively straightforward in-tuitively, the loop in the algorithm computes the set of states that are reachable from start if at any point, we encounter a state that belongs to final, we return the number of steps taken to reach that state Next, we consider the maximum delay algorithm This algorithm also takes start and final as input it returns the length of a longest path from a state in start to a state in final if there exists an infinite path beginning in a state in start that never reaches a state in final, the algorithm returns infinity The function T l(S') gives the set of states that are predecessors of some state in S' (i e T 1(S') = {s | N(s,s') holds for some s' G S'}) We also denote by E the set of all states, and by not final the state set S — final As before, the algorithm is implemented using BDDs, however, a backward search is required in this case 73 4 Condition Counting in many situations we are interested not only in the length of a path from a set of starting states to a set of final states We also need to compute measures that depend on the number of states on the path that satisfy a given condition For example, we may wish to determine the minimum (maximum) number of times a given condition holds on any path from starting to final states Both algorithms in this section take as input three sets of states: start, cond and final The algorithms compute the minimum and the maximum number of states that belong to cond, over all finite paths that begin with a state in start and terminate upon reach-ing final To guarantee that the minimum (maximum) is well-defined, we assume that any path beginning in start must reach а state in final in a finite number of steps This can be checked using the maximum delay algorithm described in the previous section Finally, we ensure that all computations involve only reach-able states, by intersecting start with the set of reach-able states computed using standard symbolic model checking algorithms To keep track at each step of the number of states in cond that have been traversed, we define a new state-transition system, in which the states are pairs con-sisting of a state in the original system and a positive integer Thus, if the original state-transition graph has state set E, then the augmented state set will be Ea = S x iN if N С E x E is the transition relation for the original state-transition graph, we define the augmented transition relation Na С Ea x Ea as Na({s, k), {s',k')) = N(s, s')A (s' G cond    к1 = к + 1    s' cond Лк1 = к) in other words, there will be a transition from (s, k) to (s', k1) in the augmented transition relation Na iff there is a transition from s to s' in the original transition relation N and either s1 G cond and к' = к + 1 or s' cond and k' = k We also define T to be the function that for a given set U C So returns the set of successors of all states in U More formally, T(t ) = {u' | Na(u, u1) holds for some и E U} in the actual BDD-based implementation, an initial bound kmax can be selected to achieve a finite representation for k, and new BDD variables can be added dynam-ically if this bound is exceeded The system is still finite-state because all paths we consider are finite and к is bounded by their maximum length proc mincount {start, cond, final) current-min = oo; R = {(s, 1) | s E start Ci cond} U{(s, 0) | s G start Ci cond}; loop Reached-final = Я Л Final; if Reached-final 0 then m = min{a: | (s, к) E Reached-final}; if m these are (virtual), not physical addresses Running the program repeatedly, addresses differ estimate: how many bits vary ? protects against attacks that need to know address values high address stack coiruinand-line arguments j and environment variables heap uninitialized data (bss) initialized to zero by exec low address initialized data read from program file by exec Figure: http:   www geeksforgeeks org memory-layout-of-c-program  entry's Frame ( ){} ( ) { f2( ); } О { = f1( ); } Main()’s Frame f2()'s Frame Return Address (entry) NULL (no previous frame) main()'s Frame Ptr Saved Registers used in main() Main()'s local variables Saved parameters tof1() Return Address (main+offset) Main()’s frame pointer f1()’s Frame Ptr Saved Registers used in f1() F1()'s local variables Saved parameters tof2() Return Address (f1+offset) f1 ()'s frame pointer f2()’s Frame Ptr http:  www backerStreet corn red stack frames htm memory A mapping from logical to physical addresses supported by processor hardware (memory management unit) and operating system - provides (program not concerned with size and usage of physical memory) virtual address space can be larger than physical memory memory transferred to from secondary memory (disk) as needed - provides can set up for memory segments memory space of one process protected from another but: can also set up Figure 8 3 Address Translation in a Paging System 4Ж Figure: W Stallings, Operating Systems, 6th ed What difference (if any) is there between s[] = ; and *p = ; ? What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ sizeof(s) is 5 * sizeof(char) &s is s, but different type, address of 5-char array: char (*) sizeof (entire array) is not strlen (up to ’ 0’) What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ sizeof(s) is 5 * sizeof(char) &s is s, but different type, address of 5-char array: char (*) sizeof (entire array) is not strlen (up to ’ 0’) : char *p = "test"; p is ’t’, p is ’ 0’ (same) p is a (char *), has a memory location What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ sizeof(s) is 5 * sizeof(char) &s is s, but different type, address of 5-char array: char (*) sizeof (entire array) is not strlen (up to ’ 0’) : char *p = "test"; p is ’t’, p is ’ 0’ (same) p is a (char *), has a memory location CANNOT assign p —= ’f' ("test" is a string ) can do p = s; thenpfO] = ’f’; can assign p = "ana"; sizeof (p) is sizeof (char *) &p is NOT p => WRONG: scanf ("704s",—fcp); RiGHT: scanf ( , p); (if p is valid address and has room The is а Can declare [LEN] , *pa; and assign Similar: a and pa have same type: * But: pa is a uses memory; (array has fixed address) pa = a; pa = addr a = addr a is a address (hex) ; * = a; *a and *pa: indirections with different operations in machine code: *a references object from address (direct addressing) *pa must first get of variable pa (an address), loading it from the constant address &pa) then dereference it (indirect addressing) Suppose we want to process a bitmap file Bitmap file header This block of bytes is at the start of the file and is used to identify the file A typical application reads this block first to ensure thatthe file is actuallya BMP file and thatit is not damaged The first 2 bytes of the BMP file formatate the character"B" then the character"M" in ASCii encoding All of the integer values are stored in little-endianformat (i e least-significant byte first) Offset hex Offset dec Size Purpose 00 0 2 bytes The headerfield used to identify the BMP and DiB file is 0x42 0x4D in hexadecimal, same as bm in ASCii The following entries are possible:   BM-Windows 3 1x, 95, NT, etc   BA - OS 2 struct bitmap array   Ci - OS 2 struct colori con   CP - OS 2 const color pointer   iC - OS 2 struct icon   PT-OS 2 pointer 02 2 4 bytes The size of the BMP file in bytes 06 6 2 bytes Reserved; actual value depends on the application that creates the image 08 8 2 bytes Reserved; actual value depends on the application that creates the image OA 10 4 bytes The offset, i e starting address, of the byte where the bitmap image data (pixel array) can befound https:  en wikipedia org wiki BMP file format Of f set (hex) Offset (dec) Size (bytes) Windows BiTMAPiNFOHEADERl’l OE 14 4 the size of this header (40 bytes) 12 18 4 the bitmap width in pixels (signed integer) 16 22 4 the bitmap height in pixels (signed integer) 1A 26 2 the number of color planes (must be 1) 1C 28 2 the number of bits per pixel, which is the color depth of the image Typical values are 1,4, 8,16, 24 and 32 1E 30 4 the compression method being used See the next table for a li st of possible values 22 34 4 the image size This is the size of the raw bitmap data; adummyO canbe given for Bi RGB bitmaps 26 38 4 the horizontal resolution of the image (pixel permeter, signed integer) 2A 42 4 thevertical resolution ofthe image (pixel permeter, signed integer) 2E 46 4 the number of colors in the color palette, or 0 to default to 2n 32 50 4 the number of important colors used, or 0 when every color is important; generally ignored То work with ints that are 2 bytes, 4 bytes, etc , need (since C99) int8 t, intl6 t, int32 t, int64 t, uint8 t, uint!6 t, uint32 t, uint64 t BMP specification: "all integers are stored in little-endian format" little-endian: least-significant byte first 0x12345678 is stored as 0x78 0x56 0x34 0x12 intel x86 big-endian: most-significant byte first 0x12345678 is stored as 0x12 0x34 0x56 0x78 Mac, PPC, Sun, internet (also called ’network byte order’) Маке sure values are read written from to file in correct byte order Allow program representation and manipulation at source or binary level Built-in analyses + APi to write your own LLVM: one of the most widely used, complete compiler toolchain PiN (intel): run-time instrumentation of binary code BAP (D Brumley, CMU): OCaml + Python bindings team won DARPA Cyber Grand Challenge 2016 angr (UC Santa Barbara): Python framework also used in Cyber Grand Challenge CiL (G Necula, Berkeley): OCaml + Perl outputs instrumented C code Example: statement representation in CiL analysis infrastructure stmtkind = i instr instr list i Return exp option * location i Goto stmt ref * location i Break location i Continue location i if exp * block * block * location i Switch exp * block * (stmt list) * location i Loop block * location * (stmt option) * (stmt option) i Block block Example: LLVM analysis infrastructure ( , ) { b*b-4*a*c; } LLVM internai representation: @delta( ° "a, ° "b, ° "c) #0 { 7 1 = %b, %b 7 2 = 7 a, 2 7 3 = 7 2, 7 c 7 4 = 7 1, 7 3 7 4 } To instrument code, traverse statements (control flow graph), identify interesting statements, insert new ones e g can log all some memory writes Address sanitizer (with recent clang   gcc versions) ( ) { *p = malloc(20); strcpy(p, ); puts(p); free(p); p[U = ; } % gcc -fsanitize=address usefree c %  a out ==31741==ERR0R: AddressSanitizer: heap-use-after-free on address 0x60300000efel at pc 0x0000004008c6 bp 0x7ffeef2227b0 sp 0x7ffeef2227a8 WRiTE of size 1 at 0x60300000efel thread TO #0 0x4008c5 in main  home marius curs bitdef usefree c:11 Szekeres, Payer, Wei, Song SoK: Eternal War in Memory, iEEE S&P 2013 automated vulnerability detection + exploit generation comparison of old (buggy) + patched program versions => exploit generation ’compilers’ for return-oriented programming exploits, etc A good read (insights into research advances): G Vigna et al , (State of) The Art of War: Offensive Techniques in Binary Analysis, iEEE Security Privacy, 2016 introduction to Formal Methods introduction to Formal Methods October 6, 2005 • Errors and their sources • What are formal methods ? • Techniques and applications - be able to verify correct behavior of designed systems - detect main error types and sources - use formal methods as an alternative to simulation and testing - use rigor in the description of systems - build appropriate models for the systems under design - unambiguously express specifications for desired properties - evaluate applicability of formal methods for a particular design - know and be able to use several verification tools - medical radiation therapy machine - 6 massive overdoses leaving several dead (1985-87, USA Canada) - cause: errors in the control program, no hardware safety backup [Leveson 1995]: • excessive trust in software when designing system • • lack of hardware interlocks • lack of appropriate practices (defensive design, specification, documentation, simplicity, formal analysis, testing) Formal verification Lectore 1 Marius Minea Formal verification Lectore 1 Marios Minea Formal verification Lectore 1 Marios Minea introduction to Formal Methods 4 introdoction to Formal Methods introdoction to Formai Methods - Self-destructed due to malfunction 40 seconds after launch (1996) - Cause: 64-bit float —> 16-bit int conversion generated uncaught exception in its A DA program - Cost: s500 M (rocket), s7 billion (project) • main cause: • code taken over from the Ariane 4, without judicious analyis - execution was no longer necessary at moment of error - no analysis of overflow for unprotected variables • bad design of system : the inertial reference system and the backup system affected by the same error Formal verification Lectore 1 Marios Minea Error in the floating point division unit (1994) • SRT division algorithm, generates 2 quotient bits per cycle (base 4) • uses a lookup table to determine next quotient digit • a few entries erroneously marked as "don’t care" => wrong values • Cost: ca s500 million • Circuit could have been formally verified at that time - by automated theorem proving [Clarke, German Zhao] - or with special data structures for multiplication [Bryant Chen] • but other more complex components were verified instead (instruc-tion execution, cache coherence) Formal verification Lectore 1 Marios Minea , 1997 • Problem: on Mars, space probe was resetting frequently • Cause: between processes sharing common resources • issue and solution were well known in literature ! [Sha, Rajkumar, Lehoczky Priority inheritance Protocols, 1990] 1 Process A (low priority) requests resource R 2 A interrupted by C (high priority) 3 C waits for R to be freed; switch back to A 4 A interrupted by В (medium priority, A C waits for lower priority B, without directly depending on it ! Solution: raising the priority of a process (A) that obtains a resource to the level of the highest priority process (C) that can request the resource Formal verification Lectore 1 Marius Minea introduction to Formal Methods introduction to Formal Methods introduction to Formal Methods , 1998 • disintegrated uppon enty to Mars atmosphere • technical error: mismatch between anglo and metric units • multiple process errors: between modules , 1998 • landing gear prematurely activated upon entry to atmosphere • resulting shock is interpreted as landing, engines are stopped • error: directly on the product => tests have im mediate relevance errors detected late are costly diagnosis needs complete observa bl lity can be performed through the design stage simulator can be significantly slower than real system Exhaustive testing and simulation is often impossible " mathematically-based languages, techniques and tools for speci-fying and verifying [ ] systems" [Clarke & Wing, 1996] Or, in more detail: ''a set of tools and notations - with a formal semantics, - used to unambiguously specify the requirements of a systsm - that allow proving properties of that specification - and proving the correciness of an implementation with respect to that specifciation" [Hinchey & Bowen, Applications of Formal Methods, 1995] Formal verification Lectore 1 Marius Minea Formal verification Lectore 1 introduction to Formal Methods • there are no absolute guarantees • a formal method cannot be better than the employed model and the specifications - model and specifications have to be However, formal methods can offer: • a logically consistent way of reasoning • exhaustive coverage, often impossible to achieve by other means • mechanization and automation => performance and correciness They can successfully simulation, testing, etc Formal verification Lectore 1 Marius Minea (E W Dijkstra, 1979) Marius Minea Formal verification Lectore 1 Marius Minea introduction to Formal Methods 11 Usefulness especially in case of: - : abstraction   approximation techniques - : difficult to reproduce and analyze otherwise - : (avionics, banking, medicine, security) Error synamies in software development [John Rushby, SRi] • 20-50 errors kloc before testing —> 2-4 errors kloc after • formal code inspection can reduce before-testing errors 10-fold i Case study on lOkloc distributed real-time code: • verification and validation: 52% cost (57% time) • of this, 27% cost in inspection, 73% in testing • 21% due to 4 defects uncovered in final testing (one of these originated in design phase) • error elimination in detailed code inspection: 160 times more effi-cient than in testing ! Formal verification Lecture 1 Marius Minea introduction to Formal Methods [NASA JPL (Voyager and Galileo probes)] • majority: deficiencies in requirement and interface specification • 1 error in 3 pages of requirements and 21 pages of code • only 1 in 3 were programming errors • 2 3 of functional errors: omissions in requirement specifications • majority of interface errors: due to bad communication Formal verification Lecture 1 Marius Minea introduction to Formal Methods introduction to Formal Methods introduction to Formal Methods • Most frequent error causes: conceptual errors, simultaneous defects, unforeseen interactions - main shortcomings: in timely application of formal methods - main cost: late error removal • Maximum potential of formal methods: - in high-level modeling and verification - for complex, concurrent, distributed, reactive, real-time, fault-tolerant systems • Requirement analysis -can identify cont rad ictions, ambiguities, omissions • Design -decomposing into components and specifying interfaces - design by successive refinement • Verification • Testing and debugging - model-based test case generation • Analysis - abstract model, less complex than real system Formal verification of: • Hardware - Combinational circuits - Sequential circuits • Software (generally speaking) • Communication protocols • Security protocols • Real-time systems • Concurrent and distributed systems Formal verification Lectore 1 Marius Minea Formal verification Lectore 1 Marius Minea Formal verification Lectore 1 Marius Minea introduction to Formal Methods 16 introduction to Formal Methods introduction to Formal Methods Two main categories: - system is represented as a finite-state machine - specification: reachability (no error state reached), or more complex (temporal logic formula) - uses exhaustive state space exploration algorithms answer: "correct— or counterexample execution sequence - model represented in logical system with axioms and deduction rules - application analysis domain represented likewise (a theory) - mechanized theorem proving: automated or manual Formal verification Lecture 1 Marius Minea • : most important, reduces verification complexity • On-the-fly state space construction and state space reduction • Symbolic state space re prese ntat ion • Refinement checking • Compositional verification • Assume-guarantee reasoning Formal verification Lecture 1 Marius Minea • Verification of combinatorial equivalence - major success, became standard in all CAD tools • Verification of sequential designs - large companies have dedicated research groups (iBM, intel, Motorola, Fujitsu, Siemens, etc ) - use publicly available verifiers or their own in-house tools • cache coherence protocols: Gigamax, iEEE Futurebus+ • Motorola 68020: modeled in Boyer-Moore theorem prover; verification of binary code produces by compilers • AAMP-5 (avionics processor): modeled in PVS theorem prover; verification of microcode for instruction execution • modeling verification of DLX-type pipelined   superscalar processors Formal verification Lecture 1 Marius Minea introduction to Formal Methods 19 introduction to Formal Methods 20 introduction to Formal Methods 21 - ADA code with annotations in SPARK language analyzed - result: "correct by construction" software, reduced cost (Traffic Collision Avoidance System) • mandatory on all U S commercial aircraft • implements automatic alert and course change if dangerously close • specification expressed in a formal language (RSML) • completeness and consistency were verified [Heimdahl, Leveson '96] • result: English-language description abandoned in favor of com-pletely formal specification - Cousot et al (1993) proved complete absence of runtime errors in main flight control software using a static program analyzer => formal models of complex systems are feasible => can be analyzed by experts from the application domain Formal verification Lectore 1 Marius Minea • Telephony Specification and analysis of interactions between vari-ous features of the telephone system • Consumer electronics Manual and later automatic verification of a control protocol from Philips audio components • Control systems in automotive electronics • Communication protocols (untimed and timed) • Security protocols Analysis using special logics to reason about encrypted messages, intruders, etc • System software Verification of device drivers Formal verification Lecture 1 Marius Minea • Specification is needed in any formal method can be the only aspect of the method (no analysis or verification) • requires a language with formally (mathematically) defined and A specification language defines: - a syntactic domain (the formal notation) - a semantic domain (the universe of regarded objects) - a precise definition of objects that satisfy a specification [M Chechik, Automatecl Verification, lecture notes, U Toronto] Formal verification Lecture 1 Marius Minea introduction to Formal Methods 22 introduction to Formal Methods 23 introduction to Formal Methods 24 - an alphabet of symbols (e g propositions, logicai operators) - grammar rules for creating well-formed formulas The semantic domain varies according to the language: -state sequences, event sequences, traces, synchronization structures (in specification languages for concurrent systems) - input output functions, relations, computations, predicate transforme rs (for programming languages) • (need not represent a computable function) • (e g programming languages) • (property-oriented) (e g , functionality, reactivity) - describe system behavior with respect to properties that must be satisfied • (model-oriented) (e g diagrams, connectors, hierarchy) - build a model of the system using precise mathematical notions (sets, functions, predicate logic) Sometimes, the same language is used for specification and model (implementation) => it is possible to do refinement with successive abstraction levels • unambiguous: has a well-defined meaning (NOT: language without formal semantics, natural language, graphical schemes with several interpretat io ns) • consistent (non-contradictory) - there exists at least an object that satisfies it • may be incomplete - can be nondeterministic or leave behavior up to implementation if the language has a system for logica! inference, one can prove properties starting from the specification (before building a model) Formal verification Lectore 1 Marius Minea Formal verification Lecture 1 Marius Minea Formal verification Lecture 1 Marius Minea introduction to Formal Methods 25 introduction to Formai Methods introduction to Formal Methods - based on first-order logic and set theory -functional, declarative description - used extensively for industrial projects in the U K [Guttag, Hornig, Garlan, MiT DEC SRC]: description with 2 parts languages 1 language-independent abstraction (specification) 2 interface specification for modules in a given language PhoneDB members : P Person telephones : Person Phone dom phones C members FindPhones CPhoneDB name? : Person numbers  : PPhone name? € dom phones numbers = phones{ {name? }-|) - a schema (PhoneDB) (states + possibly transitions), and an invariant - operations that change the state (A) or don't (H) Formal verification Lecture 1 Marius Minea Table: trait includes integer introduces new: -> Tab add: Tab, ind, Val -> Tab lookup: Tab, ind -> Val asserts  forall i, il: ind, v: Val, t: Tab  not (i  in new); i  in add (t, il, v) == i = il    i  in t lookup(add(t, i, v), il) == if i = il then v else lookup(t, il) Formal verification Lecture 1 interface specification for the C language mutable type table uses Table(table for Tab, char for ind, char for Val, int for int); constant int maxTabsize; table table create(void) { ensures result’ = new    fresh(result); } char table read(table t, char i) requires i  in t"; ensures result = lookup(f , i); } - defines preconditions and postconditions - interface stays at abstract level (without algorithms) introduction to Formal Methods 28 introduction to Formal Methods Marius Minea 29 Formal verification Lecture 1 introduction to Formal Methods Mari - originates from the efforts of the iBM Vienna group in the 70's - similar and related to Z - developed by Jean-Raymond Abrial (France) - as opposed to Z, has strong automated tool support - preconditions   postconditions, invariants, refinement - support for automated code generation - industrial usage (Paris metro, Alsthom, n • lOkloc) interface specification notions have been directly incorporated in some programming languages, e g , Eiffel (design by contract) Formal verification Lecture 1 Marius Minea Two main approaches: - traditional imperative programming + add-ons for concurrency (semaphores, monitors, rendezvous communication, etc ) - concurrent computation model, based on process interaction ("indivisible interaction") Communication and concurrency are complementary notions [Milner] • Communicating Sequential Processes [Hoare] • Calculus of Communicating Systems [Milner] Example [Hoare]: chocolate vending machine with coins Alphabet: ау = {inlp,in2p,small, large, outlp}- Behavior: V = (in2p —> {large —> —> outlp —> V)  inlp —> small —> V) or, formally: V =  j,X {in2p —> {large —> —> outlp —*• X)  inlp —> small X) (unique solution of above equation) CSP: formalism (process algebra) centered on actions with nondeterminism, synchronous composition, etc Formal verification Lecture 1 Marius Minea Formal verification Lecture 1 Mari introduction to Formal Methods 31 introduction to Formal Methods 32 introduction to Formal Methods • Variants: - labels on states or on transitions - transitions specified as functions or relations - augmented or not with variables (data) • Kripke structure: = automaton labeled with atomic propositions from a set AP: M = (S,S0,R,L) - S: finite set of states - Sq: set of initial states - RCS xS: total transition relation - L : S 2ap: state labei!ng function • Generally: the system (specification) • Behavior is correct -system is seen as implementing an input output function - example formalism: Hoare triplets {P} S {Q} { precondition } program(system) { postcondition } Sample reasoning: (P} 51 {Q1} Q1 => Q2 {Q2} 52 {R} {P} 5i;S2 {R} correct behavior • for reactive systems: conceptually infinite execution • behavior is defined by a reaction to an input sequence • specification: e g temporal logic • properties: absence of deadlock, time-bounded reaction, etc Examples: - any request is followed by a response within at most 5 seconds - any process obtains the resource an infinite number of times - on any trajectory, at some point a stable state is reached Formal verification Lecture 1 Marius Minea Formal verification Lecture 1 Marius Minea Formal verification Lecture 1 Marius Minea introduction to Formal Methods 34 Two main categories   approaches: -specification usually given in temporal logic - exhaustive state-space exploration algorithms verify the truth value of the formula or produce an execution trace as counterexample -equivalence checking: specification is also a (more abstract) model - re prese ntat ion in a logical system with axioms and deduction rules -the analyzed domain is also represented by axioms and rules (a theory) - mechanized theorem proving: manually guided or automated Formal verification Lecture 1 Marius Minea Marius Minea marius@cs upt ro http:  cs upt ro  marius curs cp  26 September 2017 no prior knowledge needed for those who know, hopefully learn more imperative programming in C some insight into alternatives handle errors test your code think of corner cases developed in 1972 at by Dennis Ritchie together with the UNiX operating system and its tools (C first developed under UNiX, then UNiX was rewritten in C) Brian Kernighan, Dennis Ritchie: (1978) Mature language, but still evolving ANSi C standard, 1989 (American National Standards institute) then iSO 9899 standard (versions: C90, C99, ) developed in 1972 at by Dennis Ritchie together with the UNiX operating system and its tools (C first developed under UNiX, then UNiX was rewritten in C) Brian Kernighan, Dennis Ritchie: (1978) Mature language, but still evolving ANSi C standard, 1989 (American National Standards institute) then iSO 9899 standard (versions: C90, C99, ) : direct access to data representation, freedom in working with memory, good hardware interface , large code base (libraries for many purposes) : good compilers that generate compact, fast code : very easy to таке i input data - through (mathematical) computations (produces) results input data - through (mathematical) computations (produces) results in mathematics, computations are expressed by we predefined functions (sin, cos, etc ) we new functions (for the given problem) we functions into more complex computations in programming, we use functions in a similar way Programs are into functions (methods, procedures) Splitting into functions helps NOT one huge piece of code Functions can be , making development efficient Functions are core for the paradigm computation is function , not assignment Functions are core to defining what is (recursive functions, lambda calculus) Squaring for integers: sqr : Z —> Z sqr(x) = x • x function function parameter type name type and name ( ) x * x; } Squaring for integers: function function parameter type name type and name sqr : Z —> Z ( ) sqr(x) = x • x x * x; } A function contains: the function , specifying: the type (range) of function values (int), function name (sqr) and parameters (the integer x) the function , within { }: here, the return with an that gives the function value from its parameters There are precise for writing in the language (the ): language elements are written in a given ; are used to precisely delimit them: ( ) ; { } syntax: detail (keywords, punctuation) vs syntax: essence (language elements concepts) function function parameter type name type and name ( ) x * x; } , function function parameter syntax: detail type name type and name (keywords, punctuation) ( ) vs { syntax: essence x * x; (language elements concepts) Essence: : function, parameter(s) : of parameter(s) and return value cannot omit (some languages: can infer types) one precise type (some languages: polymorphism, overloading) (what is computed) Details (concrete syntax): keyword, punctuation: { ; order (types first) Squaring for reals' sqrf : R —> R sqrf (x) = x • x sqrf( x) x * x; } Another function domain and range (reals) a different function even the * operator is now defined on a different set (type) Need different name to distinguish from sqr in the same program Squaring for reals' sqrf : R —> R sqrf (x) = x • x sqrf( x) x * x; } Another function domain and range (reals) a different function even the * operator is now defined on a different set (type) Need different name to distinguish from sqr in the same program and denote A is a together with a allowed for these values For reals, it is preferable to use the type (double precision) (used by library functions: sin, cos, exp, etc ) Numeric types differ in C and mathematics in math: ZcR, both are , R is dense uncountable in C: , , are both have , reals have to remember this! (overflows, precision loss) default math functions use , you should too! The type of numeric depends on their writing 2 is an integer, 2 0 is a real for reals: 1 0e-3 instead of 0 001 1 0 and 1 are equivalent, same for 0 1 and 1 + *   Multi plication must be written explicitly 1 we can’t write 2x, but 2 * x (or x * 2) Some operators have different meanings for integers and reals and different results! has an 11! (division with remainder) 7   2 is 3, but 7 0   2 0 is 3 5 -7   2 is -3, likewise -(7   2) (integer division truncates towards zero) + *   Multi plication must be written explicitly 1 we can’t write 2x, but 2 * x (or x * 2) Some operators have different meanings for integers and reals and different results! has an 11! (division with remainder) 7   2 is 3, but 7 0   2 0 is 3 5 -7   2 is -3, likewise -(7   2) (integer division truncates towards zero) The operator % is only defined for integers 9   5 = 1 9 7 5 = 4 9   -5 = -1 -9   5 = -1 -9 % 5 = -4 -9   -5 = 1 9 % -5=4 -9 % -5 = -4 Rule for integer division: a = a   b * b + a 7, b =^- sign of remainder is same as sign of dividend : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, a!2 34, exit, main, printf, int!6 t : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, al2 34, exit, main, printf, intl6 t integer: -2; floating point: 3 14; character: , string: : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( , , ) (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, al2 34, exit, main, printf, intl6 t integer: -2; floating point: 3 14; character: , string: , with various meanings: * is an operator ; terminates a statement parantheses ( ) around an expression or function parameters braces { } group declarations or statements Example: the discriminant of a quadratic equation: a   x2 + b- x + c = Q ( , , ) b*b-4*a*c; } Between the parantheses ( ) of the function header there can be arbitrary comma-separated parameters, each with its own type must give type for each parameter, even if types are the same So far, we have only functions, without using them The value of a function can be in an expression Syntax: like in mathematics: function(param, param,    , param) Example: using the previously defined sqr function we can define: ( ) x * sqr(x); } So far, we have only functions, without using them The value of a function can be in an expression Syntax: like in mathematics: function(param, param,    , param) Example: using the previously defined sqr function we can define: ( ) x * sqr(x) } iMPORTANT: in C, any identifier must be (we must know what it represents, including its type) The above examples assume that sqrf and sqr are defined before discrim and cube respectively in the program { } 0; The smallest program: it does not do anything! Any program contains the function and is executed by calling it at program start in main, other functions may be called Неге, main does not have any parameters ( ) void is a keyword for the empty type (without any element) main returns an int, interpreted as exit status by operating system: 0 = successful termination, 0 is an error code 0; at the end of main is optional (if end brace is reached, 0 is returned by default; still most programs have it explicit) { 0; } Programs may contain comments, placed between  * and *  or starting with    until (and excluding) the end of the line Comments are stripped by the preprocessor They have no effect on code generation or program execution { 0; } Programs may contain comments, placed between  * and *  or starting with    until (and excluding) the end of the line Comments are stripped by the preprocessor They have no effect on code generation or program execution Programs commented so a reader can understand (including the writer, at a later time) as documentation (may specify functionality, restrictions, etc ) explain function parameters, result, local variables specify preconditions, postconditions, error behavior ( ) printf (from "print formatted"): a standard library function is NOT a statement or a keyword is called here with one string parameter string constants are written with double quotes " "  n denotes the newline character ( ) { } printf (from "print formatted"): a standard library function is NOT a statement or a keyword is called here with one string parameter string constants are written with double quotes " "  n denotes the newline character The first line is a , it includes the stdio h which contains the of the standard input output functions = type, name, parameters: needed to use the function (compiled object code): in a which is linked at compile-time, loaded at execution time ( ) { ( ); printf( , cos(O)); 0; ( ) { x * x; } ( ) ( printf( , 2 * sqr(-3)); 0; To print the value of an expression, printf takes two arguments: - a character string (format specifier): ° "d or ° "i {decimal integer), %f (floating point) - the expression; type must be compatible with the specified one (programmer must check! compiler may warn or not) : in function, statements are executed in textual order But: statement ends function execution (no further code is executed) We cannot print a number like this: printf (5) We can write printf ( ) but this means printing a (although the effect is the same: one character printed) The first argument of printf must always be a string with or without format specifiers (special characters) Two distinct things: function ( ){ } function : sqr(2), sqr(a), etc Function definitions use (of parameters, variables, etc ) Function calls work with (2, the value of a, etc ) (they do compute with symbolic expressions) This program computes 26 = (2 • 22)2 ( ) printf( x * x; ( ) ( 0; , x, x*x); , sqr(2 * sqr(2))); What is the order of printed statements ? the square of 2 is 4 the square of 8 is 64 2 to the 6th is 64 in C, function arguments are passed all function arguments are (their value is computed) values are assigned to the (names from the function header) , function is and executes with these values This type of argument passing is named in C, function arguments are passed all function arguments are (their value is computed) values are assigned to the (names from the function header) , function is and executes with these values This type of argument passing is named The program starts executing main The first statement: printf( , sqr(2 * sqr(2))); doing the caii, printf needs the first argument: the value is known (a ) second argument: need to caii sqr (2 * sqr (2)) : the outer sqr also needs the value of its argument 2 * sqr (2) =^- need to caii sqr (2) first caii order: first sqr(2), then sqr(8), then printf C does do the following (other languages might ) Functions do start execution without computer arguments printf would print 2 to the 6th is , then need the value it would caii the outer sqr that writes the square of, then would need x it would caii sqr (2), write the square of 2 is 4, return 4, etc C does do the following (other languages might ) Functions do start execution without computer arguments printf would print 2 to the 6th is , then need the value it would caii the outer sqr that writes the square of, then would need x it would caii sqr (2), write the square of 2 is 4, return 4, etc Function parameters are substituted with printf would caii the outer sqr with the 2 * sqr (2) sqr (2) would be called twice for (2*sqr(2))*(2*sqr(2)) => in C, a function computes with , never with abs : Z —> Z x > O otherwise (x 0) => need a language construct that to decide which expression to evaluate, based on a (true false) Syntax of : condition ? exprl : expr2 - if the condition is true, only exprl is evaluated, its value becomes the result of the entire expression - if the condition is false, only expr2 is evaluated and its value becomes the value of the expression ( ) { x >= 0 ? x : -x; } Comparison operators: == (equality), != (different), , >= : condition is deemed true if it evaluates to anything nonzero, and false if 0 Comparison operators produce 1 (true) or 0 (false) iMPORTANT! The equality test in C is and not simple = ii! Note: abs exists as standard function, declared in stdlib h Г -1 X {—1, 0,1} sgn(x) = О The conditional operator has only one condition, and two branches But: either of the expressions can be arbitrarily complex must decompose the decision based on the value of x : key in problem solvi ng We rewrite the function with a single decision at any given point: 1Т 1Т sgn(x) = if x 0) -1 ( ifx = 0 0 [ else (x > 0) 1 sgn{x) ifx О')   ifx = 0 0 eise (x U) | else (x > 0) 1 ( ) { x z) z else (x > И J ify z) z The minimum of two numbers is easily written: ( , ) x z) z else (x > И J ify z) z We notice the structure of min2 is repeated => can do it simpler: The result is the minimum between the minimum of the first two numbers and the third =^- just apply min2 twice! min2(min2(x, y), z); Marius Minea 25 September 2017 Programming languages are and one of the oldest CS fields is an important current issue mainstream languages still appear and evolve (Java, C#, ) + lots of languages impacts (polymorphism, reflection, ), (type safety, interference), (compilation ), etc : needed in verification, testing, parallelization, certification, performance estimation, SiGPLAN motto: "To explore programming language concepts and tools focusing on design, implementation and efficient use " of programming languages Understand and impact of Learn language program (semantics, reasoning) introduction to current programming language "a programming language is a tool which should assist the programmer in the most difficult aspects of his art, namely program design, documentation, and debugging" [Hoare, Hints on programming language design, 1973] Main programming language conferences (ACM SiGPLAN) PoPL: Principles of Programming Languages PLDi: Programming Language Design and implementation OOPSLA: Object-Oriented Programming, Languages, Systems and Applications (now: SPLASH) All of them have "most influential paper award" (10 years later) + best paper award (current year) + 20 years of PLDi (1979-1999) symbolic computation lazy evaluation, closures, higher-order functions and continuations, concurrency, inter-process communication and synchronization, active objects and mobile agents, object views, directed interfaces, and dynamic type systems, reflection and introspection persistent object systems and garbage collection, error management, assertions and declarative debugging, aspect-oriented programming, generative programming, constraint imperative programming, staged compilation and virtual machines course, Linkdping University Functional programming simple mathematical Foundation: lambda calculus (possibly typed) in pure form avoids and ‘‘The determined Real Programmer can write functional programs in any language" (paraphrasing Ed Post) Exercise 1: program without state and variables in C Exercise 2: simulate state and an interpreter in Haskell   ML (lab) Programming encompasses three things: 1 a computation model: a formal system that defines a and how it is 2 a set of and used to write programs in that language 3 a set of for reasoning about programs and calculating their efficiency [vanRoy Haridi, Concepts, Techniques and Models of Computer Programming] = approach to programming based on a mathematical theory or a coherent set of principles many languages => fewer paradigms => still fewer concepts Key concepts form a paradigm’s Discipline and idea: Mathematics and the theory of functions Values produced are impossible to change part of a composite value But can make a revised сору of composite value : no matter when done, computation produces same value pure functional programming is side-effect free : all computations done by applying (calling) functions Functions are the natural (for expression evaluation) Functions are : full-fledged data just like numbers, lists, Computations after K Normark, course, Aalborg U A first-class object is one that can be: as an argument as a value, and in a data structure What is first-class influences your choices of abstraction: Languages with first-class functions can represent data as procedures Example: represent two constructors: empty environment enlarge environment with (symbol value) pair one observer: give value of Symbol in environment Functional   declarative operations are: (do not depend on any externai execution state) (no internai execution state remembered between calls) (same result when given same arguments) Why is functional programming important ? Declarative programs are compositional naturally concurrent (since stateless) Reasoning about declarative programs is simple [van Roy & Haridi] "This book brings you face-to-face with the most fundamental idea in computer programming: The interpreter for a computer language is just another program" Hal Abelson foreword to Friedman, Wand Haynes, Essentials of Programming Languages Writing an interpreter: makes you think about fundamental defines the meaning of programs: => our first lab assignment a name identifier to an object (expression value) : before running the program (e g , usual function caii) : at runtime (e g , 00 virtual method caii) Binding and variable assignment are NOT the same Pure functional languages have binding but do NOT have assignment (mutable values) Rebinding and mutation are NOT the same = a context to which objects (names, etc ) are associated an identifier is visible within its scope scoping determined by program text, not by runtime execution sequence aids modularity, understanding, reasoning (in isolation) scoping scope=remainder of the execution during which binding is in effect each identifier has stack of bindings (push pop on enter exit scope) meaning of code depends on past execution (of other code) Some languages allow choice of static   dynamic scoping (e g , Perl) Functions can be: passed as an argument returned as a value, and stored in a data structure Ex List map (fun x -> x + 1) Data List map ( x -> x + 1) (ML) (Haskell) = functions that return a function e g , (+) : int -> int -> int = (ML) (+) 3: int -> int = (same as fun x -> x + 3) A function of several parameters can be rewritten through currying (after Haskell Curry) fun x у -> X + у fun x -> fun у -> X + у = a together with an defining its needed to implement static scoping with first-order functions Python example [cf Wikipedia] def counterO: x = 0 def inc(): nonlocal x x += 1 print(x) return inc counterl inc = counterO counter2 inc = counterO counterl incO # 1 counterl incO # 2 counter2 incO # 1 counterl incO # 3 Marius Minea September 27, 2017 Black-box testing (no source access) Glass-box white-box testing (with source access) - Generating unit tests - Test coverage metrics Static analyis (of source code) Dynamic analysis Testing object-oriented programs Testing concurrent programs Formal verification of programs models Automated test generation - including Model-based testing Security testing Design of a test plan - medical system for radiation thrapy - 6 accidents with fatalities and grave wounds (1985-87, US, Canada) - direct cause: errors in control program - medical system for radiation thrapy - 6 accidents with fatalities and grave wounds (1985-87, US, Canada) - direct cause: errors in control program [Leveson 1995]: excessive trust in software when designing system lack of hardware safety measures (interlocks) lack of appropriate (defensive design, specification, documentation, simplicity, formal analysis, testing) - Self-destruction after a fault 40 seconds from launch (1996) - Cause: conversion of 64-bit float to 16-bit int generated unhandled overflow exception in the ADA program - Cost: s500 million (rocket), s7 billion (whole project) - Self-destruction after a fault 40 seconds from launch (1996) - Cause: conversion of 64-bit float to 16-bit int generated unhandled overflow exception in the ADA program - Cost: s500 million (rocket), s7 billion (whole project) main cause: code taken from the Ariane 4, without proper analysis: - execution of faulty code no longer needed at that flight stage - Ariane 4 had proved absence of overflow for unprotected variables, but new settings were different bad design of : the inertial reference system and its backup were taken out by the same error Error in the floating-point division algorithm SRT division algorithm, base 4 determines the next quotient digit from a lookup table some entries erroneously marked as "don’t care" cost: some s500 million Error in the floating-point division algorithm SRT division algorithm, base 4 determines the next quotient digit from a lookup table some entries erroneously marked as "don’t care" cost: some s500 million Circuit could have been verified formally - by automated theorem proving, or - with special data structures to represent multiplication and division but verification effort was focused on more complex parts (execution unit, cache coherence protocol) , 1997 once on Mars, the spacecraft was frequently resetting cause: between processes with common resources the problem and solution had been described in the literature [Sha, Rajkumar, Lehoczky Priority inheritance Protocols, 1990] 1 low priority process A acquires resource R 2 A interrupted by C (high priority) 3 C awaits availability of R; A resumes execution 4 A interrupted by long-running В (priorities: A a test discovers (and localizes) an error The role of a tester is - as early as possible (repair cost increases with time) - and ensure they get fixed (reports, debugging, maintenance) (Patton, Software Testing) They are explorers They are troubleshooters They are relentless They are creative They are (mellowed) perfectionists They exercise good judgment They are tactful and diplomatic (?) They are persuasive (!) A test case must define the expected result or output - otherwise, we will see what we want to see A programmer should avoid testing their own program - psychologically, does not want to find errors - exception: uni testing   test-driven development Corollary: test group should not be development group We need test cases for valid and invalid inputs Must test program does what is needed and doesn’t do what it shouldn’t Keep and reuse test cases! Don’t plan the test process assuming there won’t be errors! Probability to find errors in a piece of code is proportional to number of errors already found Software testing is an exercise of evaluating risks The more errors you find, the more there still are Pesticide paradox (Beizer): errors become resilient to tests (to find new errors, one needs new tests) Not all errors found will be corrected Product specifications are never definitive Testers are not the most popular project team members :) Software testing is a technical profession governed by a discipline Sample brief test report [Marnie Hutcheson, Software Testing Fundamentals] "As per our agreement, we have tested 67 percent of the test inventory [ ] the most important tests in the inventory as determined by our joint risk analysis The bug find rates and the severity composition of the bugs we found were within the expected range Our bug fix rate is 85 percent it has been three weeks since we found a Severity 1 issue There are currently no known Severity 1 issues open Fixes for the last Severity 2 issues were regression-tested and approved a week ago Overall, the system seems to be stable The load testing has been concluded The system failed at 90 percent of the design load The system engineers [ ] will need 3 months to implement the fix Our recommendation is to ship on schedule, with the understanding that we have an exposure if the system utilization exceeds the projections before we have a chance to install the previously noted fix " [Cem Kaner, Black-box software testing course, Florida inst of Tech] What do we test ? What do we want to achieve ? What is the testing ? How do we organize work to achieve the mission ? The test problem When have we tested enough ? The problem of in testing "A technical investigation conducted to provide quality-related information about a software product to a stakeholder" [Kaner] investigation: active, organized search for information technical: experiments, logic, models, algorithms, tools software product: everything the client gets (software, hardware, databases, documentation, etc ) stakeholders: in success of product, and of testing [ Kaner 2003 - What is a good test case ? ] - Find defects: especially in interesting parts (good coverage) - Maximize bug count: in limited time - Block premature release + help make ship no-ship decision - Minimize technical support cost - Assess conformance to specifications   rules   standards - Minimize risk (inel, safety-related lawsuit risk) - Find scenarios where product works (despite bugs) - workarounds - Assess quality: but: cannot ensure quality just by testing - verify correciness (absence of errors) - ensure quality (QA is a process issue) powerful: high chance of discovering bug if present credible realistic (to stakeholders): no corner cases (except: safety!) representative   likely to be encountered by customer easy to evaluate (is it a bug or not?)   easy to debug   informative appropriately complex (progressive) offer insight into some aspect of product   customer   environment (e g detect change in behavior   performance) Marius Minea September 26, 2017 Security of operating system + applications network security Security of operating system + applications network security vulnerabilities and their prevention security of web applications Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security and their modeling authentication, key generation exchange, etc principles and tools for modeling and analysis "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal implies the existence of an , targeting thinking of modeling attacker capabilities is essential inel, multiple, colluding attackers By knowing tehnical details (operating systems, networks, programming, crypto) By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By understanding: fundamental notions: what needs protected? how? from what attacks? principles (design construction): general, not necessarily technical [ В Schneier, Beyond Fear] What are you trying to protect? What are the to those assets? How well does the solution those risks? What does the solution cause? What does the solution impose? - protecting   hiding information or resources - typically done through cryptography - or other undisclosed mechanisms - not just , even may be confidential (cf steganography) - includes hiding the resources = trust in data or resources - expressed by preventing unauthorized modifications We distinguish: - data integrity (of content) - data origin authentication integrity mechanisms - prevention mechanisms of unauthorized data manipulation (e g from outside) of data manipulation in unauthorized ways (e g from inside) - detection mechanisms [M Bishop: Computer Security: Art and Science, Pearson, 2003] = the ability of using information or a resource in the desired way A system which is not available can be worse than one nonexistent Availability is usually analyzed in the context of some (statistical) assumptions about the environment if the assumptions are not satisfied, the system may be compromised denial of service attacks - may be difficult to detect if the traffic (partially) matches the allowed statistic pattern Privacy, Availability-Authentication, integrity, Non-repudiation Parkerian Hexad (Donn Parker, 2002): confidentiality (important even without violating confidentiality) integrity (of origin or author) availability (ex data converted to useless format availability) [Handbook of Applied Cryptography] signature authorization access control timestamping wiinessing (by someone other than originator) confirmation anonymity revocation traceability   accountability Confidentiality, integrity, availability are We discuss (potential) and (real) offered to those Services Threat classification [R Shirey, cf M Bishop] - disclosure - deception (forcing acceptance of false data) - disruption = interrupting   stopping normal service - usurpation = unauthorized control of part of a system Microsoft STRiDE threat model poofing identity - impersonating ampering with data - falsifying   attack on integrity epudiation - negating the effect of an action nformation disclosure - attack to confidentiality enial of service - attack to availability levation of privilege - unauthorized additional rights interception (snooping) in particular: (passive) wiretapping modifying   altering data => deception also interruption   usurpation (gaining control) active wiretapping, man-in-the-middle attack (actively changing content) impersonation (masquerading, spoofing) repudiation of origin (e g in commercial transactions) denial of receipt - a form of deception delay - could be service interruption, also usurpation denial of service а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions d) : (NOT: security through obscurity) => mechanisms may be publicly checked to gain trust e) : separation increases robusiness e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed : separation increases robusiness e) f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed 2 additional ones: Work factor, compare needed effort with attacker resources Compromise recording- in case of failure, alarm audit still useful weakest link determines security of entire system adequate protection principie not maximal security, but utility at acceptable risk cost principie of efficiency (cf acceptability) appropriate, easy to use correctly defense in depth: layered protection [Ninghui Li, CS 426: Computer Security, course, Purdue University] - "probe": acces a target to determine characteristics - "scan": sytematically access (probe) several targets - "flood": repeated access to a target to overload it - authentication: present an identity for verification and ulterior access - bypass: circumvent a control authorization process using an alternate method to access a target - spoof masquerade: assume some other identity - read - сору - steal (take into posession and eliminate the original) - modify - delete unauthorized (increased) access to a system or network information disclosure (attack to confidentiality) information corruption (attack to integrity) denial of service (attack to availability) theft of resources (unauthorized use): a type of usurping resource error modes: passive vs active (does not vs does what it shouldn’t) danger of errors in rare cases security imbalances - effect of large-scale technologies fragile (brittle) systems vs resilient to errors protection methods: adaptive to unforeseen situations monocultures (homogeneous systems) - vulnerable to same attack e g majority of systems is running Windows security is a human & social problem in security, we make (statements) of various entities These statements are not absolute, they are based on assumptions => Security is a matter of trust: in whom what can we trust? Ken Thompson: Reflections on Trusting Trust (Turing Award Lecture '83) inserted a trojan into the login program and C compiler to accept a special password (known by originator) by using self-reproducing code "You can’t trust code that you did not create yourself" "No amount of source-level verification or scrutiny will prevent you from using untrusted code" every file is owned by a user and group individual permission bits: read, write, execute search 3 groups of bits for: user, group, others Meaning for directories is more complex than for files: r is needed for readO, readdirO, opendirO => for is x ("search") is needed for chdirO and stat() (any file) What permissions are needed to read a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) Special bits: - sticky bit: for directory: file can only be deleted by owner - set user iD: execute with efFective iD of file owner - set group iD: execute with efFective iD of file group A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - effective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - effective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? Q2: Why is saving the old UiD not left to the programmer ? setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privileges are set - else (euid ф 0): can only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privi leges are set - else (euid ф 0): can only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? seteuid(val) allowed only if euid == 0 or if val is one of the three values (euid ruid saved) sets on y euid, does not change ruid and saved uid changes are by another seteuid caii A is а statement of what is, and what is not, allowed A is a method, tool or procedure for a security policy Bishop, Computer Security: Art and Science we need to check if the mechanism is correct A mechanism may be: - safe (does not allow states disallowed by the policy) - precise (allows exact y what the policy specifies) - broad (allows more than the policy does) a mechanism to allow or deny an entity’s access to a resource "principal" subject —> request —> guard monitor —> object Access control consists of two steps: : Who made the access request ? : Does subject s have access rights for resource o 7 We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? log: append, without changing prior contents execute encryption, without knowing the key October б, 2005 • Errors and their sources • What are formal methods ? • Techniques and applications Formal verification Lecture 1 Marius Minea introduction to Formal Methods 2 - be able to verify correct behavior of designed systems - detect main error types and sources - use formal methods as an alternative to simulation and testing - use rigor in the description of systems - build appropriate models for the systems under design - unambiguously express specifications for desired properties - evaluate applicability of formal methods for a particular design - know and be able to use several verification tools Formal verification Lecture 1 Marius Minea introduction to Formal Methods 3 - medical radiation therapy machine - 6 massive overdoses leaving several dead (1985-87, USA Canada) - cause: errors in the control program, no hardware safety backup [Leveson 1995]: • excessive trust in software when designing system • • lack of hardware interlocks • lack of appropriate practices (defensive design, specification, documentation, simplicity, formal analysis, testing) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 4 - Self-destructed due to malfunction 40 seconds after launch (1996) - Cause: 64-bit float 16-bit int conversion generated uncaught exception in its ADA program - Cost: s500 M (rocket), s7 billion (project) • main cause: • code taken over from the Ariane 4, without judicious analyis - execution was no longer necessary at moment of error - no analysis of overflow for unprotected variables • bad design of system : the inertial reference system and the backup system affected by the same error Formal verification Lecture 1 Marius Minea introduction to Formal Methods 5 Error in the floating point division unit (1994) • SRT division algorithm, generates 2 quotient bits per cycle (base 4) • uses a lookup table to determine next quotient digit • a few entries erroneously marked as "don’t care" => wrong values • Cost: ca s500 million • Circuit could have been formally verified at that time - by automated theorem proving [Clarke, German & Zhao] - or with special data structures for multiplication [Bryant & Chen] • but other more complex components were verified instead (instruc-tion execution, cache coherence) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 6 , 1997 • Problem: on Mars, space probe was resetting frequently • Cause: between processes sharing common resources • issue and solution were well known in literature ! [Sha, Rajkumar, Lehoczky Priority inheritance Protocols, 1990] 1 Process A (low priority) requests resource R 2 A interrupted by C (high priority) 3 C waits for R to be freed; switch back to A 4 A interrupted by В (medium priority, A C waits for lower priority B, without directly depending on it ! Solution: raising the priority of a process (A) that obtains a resource to the level of the highest priority process (C) that can request the resource Formal verification Lecture 1 Marius Minea introduction to Formal Methods 7 , 1998 • disintegrated uppon enty to Mars atmosphere • technical error: mismatch between anglo and metric units • multiple process errors: between modules , 1998 • landing gear prematurely activated upon entry to atmosphere • resulting shock is interpreted as landing, engines are stopped • error: Formal verification Lecture 1 Marius Minea introduction to Formal Methods 8 directly on the product => tests have immediate relevance errors detected late are costly diagnosis needs complete observability can be performed through the design stage simulator can be significantly slower than real system Exhaustive testing and simulation is often impossible (E W Dijkstra, 1979) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 9 " mathematically-based languages, techniques and tools for speci-fying and verifying [ ] systems" [Clarke & Wing, 1996] Or, in more detail: "a set of tools and notations - with a formal semantics, - used to unambiguously specify the requirements of a systsm - that allow proving properties of that specification - and proving the correciness of an implementation with respect to that specifciation" [Hinchey & Bowen, Applications of Formal Methods, 1995] Formal verification Lecture 1 Marius Minea introduction to Formal Methods 10 • there are no absolute guarantees • a formal method cannot be better than the employed model and the specifications - model and specifications have to be However, formal methods can offer: • a logically consistent way of reasoning • exhaustive coverage, often impossible to achieve by other means • mechanization and automation => performance and correciness They can successfully simulation, testing, etc Formal verification Lecture 1 Marius Minea introduction to Formal Methods 11 Usefulness especially in case of: - : abstraction   approximation techniques - : difficult to reproduce and analyze otherwise - : (avionics, banking, medicine, security) Error synamics in software development [John Rushby, SRi] • 20-50 errors kloc before testing 2-4 errors kloc after • formal code inspection can reduce before-testing errors 10-fold ! Case study on lOkloc distributed real-time code: • verification and validation: 52% cost (57% time) • of this, 27% cost in inspection, 73% in testing • 21% due to 4 defects uncovered in final testing (one of these originated in design phase) • error elimination in detailed code inspection: 160 times more effi-cient than in testing ! Formal verification Lecture 1 Marius Minea introduction to Formal Methods 12 [NASA JPL (Voyager and Galileo probes)] • majority: deficiencies in requirement and interface specification • 1 error in 3 pages of requirements and 21 pages of code • only 1 in 3 were programming errors • 2 3 of functional errors: omissions in requirement specifications • majority of interface errors: due to bad communication Formal verification Lecture 1 Marius Minea introduction to Formal Methods 13 • Most frequent error causes: conceptual errors, simultaneous defects, unforeseen interactions - main shortcomings: in timely application of formal methods - main cost: late error removal • Maximum potential of formal methods: - in high-level modeling and verification - for complex, concurrent, distributed, reactive, real-time, fault-tolerant systems Formal verification Lecture 1 Marius Minea introduction to Formal Methods 14 • Requirement analysis - can identify contradictions, ambiguities, omissions • Design - decomposing into components and specifying interfaces - design by successive refinement • Verification • Testing and debugging - model-based test case generation • Analysis - abstract model, less complex than real system Formal verification Lecture 1 Marius Minea introduction to Formal Methods 15 Formal verification of: • Hardware - Combinational circuits - Sequential circuits • Software (generally speaking) • Communication protocols • Security protocols • Real-time systems • Concurrent and distributed systems Formal verification Lecture 1 Marius Minea introduction to Formal Methods 16 Two main categories: - system is represented as a finite-state machine - specification: reachability (no error state reached), or more complex (temporal logic formula) - uses exhaustive state space exploration algorithms answer: "correct— or counterexample execution sequence - model represented in logical system with axioms and deduction rules - application analysis domain represented likewise (a theory) - mechanized theorem proving: automated or manual Formal verification Lecture 1 Marius Minea introduction to Formal Methods 17 • : most important, reduces verification complexity • On-the-fly state space construction and state space reduction • Symbolic state space representation • Refinement checking • Compositional verification • Assume-guarantee reasoning Formal verification Lecture 1 Marius Minea introduction to Formal Methods 18 • Verification of combinatoria! equivalence - major success, became standard in all CAD tools • Verification of sequential designs - large companies have dedicated research groups (iBM, intel, Motorola, Fujitsu, Siemens, etc ) - use publicly available verifiers or their own in-house tools • cache coherence protocols: Gigamax, iEEE FuturebusH- • Motorola 68020: modeled in Boyer-Moore theorem prover; verification of binary code produces by compilers • AAMP-5 (avionics processor): modeled in PVS theorem prover; verification of microcode for instruction execution • modeling verification of DLX-type pipelined   superscalar processors Formal verification Lecture 1 Marius Minea introduction to Formal Methods 19 - ADA code with annotations in SPARK language analyzed - result: "correct by construction" software, reduced cost (Traffic Collision Avoidance System) • mandatory on all U S commercial aircraft • implements automatic alert and course change if dangerously close • specification expressed in a formal language (RSML) • completeness and consistency were verified [Heimdahl, Leveson ’96] • result: English-language description abandoned in favor of com-pletely formal specification - Cousot et al (1993) proved complete absence of runtime errors in main flight control software using a static program analyzer => formal models of complex systems are feasible => can be analyzed by experts from the application domain Formal verification Lecture 1 Marius Minea introduction to Formal Methods 20 • Telephony Specification and analysis of interactions between vari-ous features of the telephone system • Consumer electronics Manual and later automatic verification of a control protocol from Philips audio components • Control systems in automotive electronics • Communication protocols (untimed and timed) • Security protocols Analysis using special logics to reason about encrypted messages, intruders, etc • System software Verification of device drivers Formal verification Lecture 1 Marius Minea introduction to Formal Methods 21 • Specification is needed in any formal method can be the only aspect of the method (no analysis or verification) • requires a language with formally (mathematically) defined and A specification language defines: - a syntactic domain (the formal notation) - a semantic domain (the universe of regarded objects) - a precise definition of objects that satisfy a specification [M Chechik, Automated Verification, lecture notes, U Toronto] Formal verification Lecture 1 Marius Minea introduction to Formal Methods 22 - an alphabet of symbols (e g propositions, logical operators) - grammar rules for creating well-formed formulas The semantic domain varies according to the language: - state sequences, event sequences, traces, synchronization structures (in specification languages for concurrent systems) - input output functions, relations, computations, predicate trans-formers (for programming languages) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 23 • (need not represent a computable function) • (e g programming languages) • (property-oriented) (e g , functionality, reactivity) - describe system behavior with respect to properties that must be satisfied • (model-oriented) (e g diagrams, connectors, hierarchy) - build a model of the system using precise mathematical notions (sets, functions, predicate logic) Sometimes, the same language is used for specification and model (implementation) => it is possible to do refinement with successive abstraction levels Formal verification Lecture 1 Marius Minea introduction to Formal Methods 24 • unambiguous: has a well-defined meaning (NOT: language without formal semantics, natural language, graphical schemes with several interpretations) • consistent (non-contradictory) - there exists at least an object that satisfies it • may be incomplete - can be nondeterministic or leave behavior up to implementation if the language has a system for logical inference, one can prove properties starting from the specification (before building a model) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 25 - based on first-order logic and set theory - functional, declarative description - used extensively for industrial projects in the U K FindPhones PhoneDB = PhoneDB members : PPerson name? : Person telephones : Person Phone numbers  : P Phone dom phones C members name? 6 dom phones number s = phones (|{name? }|) - a schema (PhoneDB) (states + possibly transitions), and an invariant - operations that change the state (Д) or don’t (H) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 26 [Guttag, Hornig, Garlan, MiT DEC SRC]: description with 2 parts languages 1 language-independent abstraction (specification) 2 interface specification for modules in a given language Table: trait includes integer introduces new: -> Tab add: Tab, ind, Val -> Tab lookup: Tab, ind -> Val asserts  forali i, il: ind, v: Val, t: Tab  not (i  in new); i  in add (t, il, v) == i = il    i  in t lookup(add(t, i, v), il) == if i = il then v else lookup(t, il) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 27 interface specification for the C language mutable type table uses Table(table for Tab, char for ind, char for Val, int for int); constant int maxTabsize; table table create(void) { ensures result’ = new    fresh(result); char table read(table t, char i) requires i  in t*; ensures result = lookup(t , i) ; - defines preconditions and postconditions - interface stays at abstract level (without algorithms) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 28 — originates from the efforts of the iBM Vienna group in the 70’s - similar and related to Z - developed by Jean-Raymond Abrial (France) - as opposed to Z, has strong automated tool support - preconditions   postconditions, invariants, refinement - support for automated code generation - industrial usage (Paris metro, Alsthom, n • lOkloc) interface specification notions have been directly incorporated in some programming languages, e g , Eiffel (design by contract) Formal verification Lecture 1 Marius Minea introduction to Formal Methods 29 Two main approaches: - traditional imperative programming + add-ons for concurrency (semaphores, monitors, rendezvous communication, etc ) - concurrent computation model, based on process interaction ("indivisible interaction") Communication and concurrency are complementary notions [Milner] • Communicating Sequential Processes [Hoare] • Calculus of Communicating Systems [Milner] Formal verification Lecture 1 Marius Minea introduction to Formal Methods 30 Example [Hoare]: chocolate vending machine with coins Alphabet: ay = {znlp, zn2p, small, large, outlp} Behavior: V = (zn2p (large V small outlp V)  inlp small V) or, formally: V =  jX (in2p (large X small outlp —> X)  inlp small X) (unique solution of above equation) CSP: formalism (process algebra) centered on actions with nondeterminism, synchronous composition, etc Formal verification Lecture 1 Marius Minea introduction to Formal Methods 31 • Variants: - labels on states or on transitions - transitions specified as functions or relations - augmented or not with variables (data) • Kripke structure: = automaton labeled with atomic propositions from a set AP: M = (S,S^R,L) - S: finite set of states - Sq set of initial states - R C S x S: total transition relation - L : S 2AP: state la bel i ng function Formal verification Lecture 1 Marius Minea introduction to Formal Methods 32 • Generally: the system (specification) • Behavior is correct - system is seen as implementing an input output function - example formalism: Hoare triplets {F} S {Q} { precondition } program(system) { postcondition } Sample reasoning: {P} Si {Qi} Qi => Q2 {Q2} S2 {R} {P} S1-S2 {Д} Formal verification Lecture 1 Marius Minea introduction to Formal Methods 33 correct behavior • for reactive systems: conceptually infinite execution • behavior is defined by a reaction to an input sequence • specification: e g temporal logic • properties: absence of deadlock, time-bounded reaction, etc Examples: - any request is followed by a response within at most 5 seconds - any process obtains the resource an infinite number of times - on any trajectory, at some point a stable state is reached Formal verification Lecture 1 Marius Minea introduction to Formal Methods 34 Two main categories   approaches: - specification usually given in temporal logic - exhaustive state-space exploration algorithms verify the truth value of the formula or produce an execution trace as counterexample - equivalence checking: specification is also a (more abstract) model - representation in a logical system with axioms and deduction rules - the analyzed domain is also represented by axioms and rules (a theory) - mechanized theorem proving: manually guided or automated Formal verification Lecture 1 Marius Minea Marius Minea September 26, 2017 Security of operating system + applications network security Security of operating system + applications network security vulnerabilities and their prevention security of web applications Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security and their modeling authentication, key generation exchange, etc principles and tools for modeling and analysis "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal implies the existence of an , targeting thinking of modeling attacker capabilities is essential inel, multiple, colluding attackers By knowing tehnical details (operating systems, networks, programming, crypto) By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By knowing tehnical details (operating systems, networks, programming, crypto) By thinking [v Schneier] like an attacker (technical and social aspects) social engineering: e g , impersonate maintenance to get access By understanding: fundamental notions: what needs protected? how? from what attacks? principles (design construction): general, not necessarily technical [ В Schneier, Beyond Fear] What are you trying to protect? What are the to those assets? How well does the solution those risks? What does the solution cause? What does the solution impose? - protecting   hiding information or resources - typically done through cryptography - or other undisclosed mechanisms - not just , even may be confidential (cf steganography) - includes hiding the resources = trust in data or resources - expressed by preventing unauthorized modifications We distinguish: - data integrity (of content) - data origin authentication integrity mechanisms - prevention mechanisms of unauthorized data manipulation (e g from outside) of data manipulation in unauthorized ways (e g from inside) - detection mechanisms [M Bishop: Computer Security: Art and Science, Pearson, 2003] = the ability of using information or a resource in the desired way A system which is not available can be worse than one nonexistent Availability is usually analyzed in the context of some (statistical) assumptions about the environment if the assumptions are not satisfied, the system may be compromised denial of service attacks - may be difficult to detect if the traffic (partially) matches the allowed statistic pattern Privacy, Availability-Authentication, integrity, Non-repudiation Parkerian Hexad (Donn Parker, 2002): confidentiality (important even without violating confidentiality) integrity (of origin or author) availability (ex data converted to useless format availability) [Handbook of Applied Cryptography] signature authorization access control timestamping wiinessing (by someone other than originator) confirmation anonymity revocation traceability   accountability Confidentiality, integrity, availability are We discuss (potential) and (real) offered to those Services Threat classification [R Shirey, cf M Bishop] - disclosure - deception (forcing acceptance of false data) - disruption = interrupting   stopping normal service - usurpation = unauthorized control of part of a system Microsoft STRiDE threat model poofing identity - impersonating ampering with data - falsifying   attack on integrity epudiation - negating the effect of an action nformation disclosure - attack to confidentiality enial of service - attack to availability levation of privilege - unauthorized additional rights interception (snooping) in particular: (passive) wiretapping modifying   altering data => deception also interruption   usurpation (gaining control) active wiretapping, man-in-the-middle attack (actively changing content) impersonation (masquerading, spoofing) repudiation of origin (e g in commercial transactions) denial of receipt - a form of deception delay - could be service interruption, also usurpation denial of service а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions а) : кеер design as simple and small as possible unwanted access paths will not be noticed during normal use => security by design, not as an afterthought b) : base access decisions based on permission rather than exclusion (default deny) c) : check every access, every time (including in exceptional cases, maintenance ) NOT based on previously taken decisions d) : (NOT: security through obscurity) => mechanisms may be publicly checked to gain trust e) : separation increases robusiness e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based e) : separation increases robusiness f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed : separation increases robusiness e) f) : every program and user should operate with the minimal set of privileges needed for the given task g) : minimize common resources, interference among users, the mechanisms on which everything is based h) not unduly interfere with common activity if mechanisms are not simple, they will be misused or bypassed 2 additional ones: Work factor, compare needed effort with attacker resources Compromise recording- in case of failure, alarm audit still useful weakest link determines security of entire system adequate protection principie not maximal security, but utility at acceptable risk cost principie of efficiency (cf acceptability) appropriate, easy to use correctly defense in depth: layered protection [Ninghui Li, CS 426: Computer Security, course, Purdue University] - "probe": acces a target to determine characteristics - "scan": sytematically access (probe) several targets - "flood": repeated access to a target to overload it - authentication: present an identity for verification and ulterior access - bypass: circumvent a control authorization process using an alternate method to access a target - spoof masquerade: assume some other identity - read - сору - steal (take into posession and eliminate the original) - modify - delete unauthorized (increased) access to a system or network information disclosure (attack to confidentiality) information corruption (attack to integrity) denial of service (attack to availability) theft of resources (unauthorized use): a type of usurping resource error modes: passive vs active (does not vs does what it shouldn’t) danger of errors in rare cases security imbalances - effect of large-scale technologies fragile (brittle) systems vs resilient to errors protection methods: adaptive to unforeseen situations monocultures (homogeneous systems) - vulnerable to same attack e g majority of systems is running Windows security is a human & social problem in security, we make (statements) of various entities These statements are not absolute, they are based on assumptions => Security is a matter of trust: in whom what can we trust? Ken Thompson: Reflections on Trusting Trust (Turing Award Lecture '83) inserted a trojan into the login program and C compiler to accept a special password (known by originator) by using self-reproducing code "You can’t trust code that you did not create yourself" "No amount of source-level verification or scrutiny will prevent you from using untrusted code" every file is owned by a user and group individual permission bits: read, write, execute search 3 groups of bits for: user, group, others Meaning for directories is more complex than for files: r is needed for readO, readdirO, opendirO => for is x ("search") is needed for chdirO and stat() (any file) What permissions are needed to read a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) What permissions are needed to read a file ? x on the entire path and r for the file What permissions are needed for is -1 name? needs info from inode, thus x on the parent directory (also, x on the path); independent of permissions on name if name is a directory, is -1 lists contents (needs r) is -id only gives directory info, so answer is as above What permissions are needed to delete a file ? w in parent directory, as well as x Need not have w for the file! What can you do with x on directory but not r ? You can access a file with known name, but can’t search for a file (e g search for file on a web server) Special bits: - sticky bit: for directory: file can only be deleted by owner - set user iD: execute with efFective iD of file owner - set group iD: execute with efFective iD of file group A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - efFective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? A process has (in most newer versions) three user-related identifiers: - real user iD: (initial) owner of the process - effective user iD: determines access rights - saved user iD: used to revert to a previous UiD Normally: ruid = euid = user launching the process Exception: euid = owner of the loaded executable, when it has the s (setuid) bit set => running with other privileges (e g elevated) (similar for group identifiers) Ql: Why do we need functions to manipulate UiDs at runtime? Q2: Why is saving the old UiD not left to the programmer ? setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privi leges are set - else (euid ф 0): can only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? setuid(val) - if euid = O (root), set ruid=euid=val (and saved uid too) => UiDs   privileges are set - else (euid ф 0): can only set euid = val if val is real or saved uid ruid and saved uid unchanged Q3: what are the limitations if only this caii exists? seteuid(val) allowed only if euid == 0 or if val is one of the three values (euid ruid saved) sets on y euid, does not change ruid and saved uid changes are by another seteuid caii A is а statement of what is, and what is not, allowed A is a method, tool or procedure for a security policy Bishop, Computer Security: Art and Science we need to check if the mechanism is correct A mechanism may be: - safe (does not allow states disallowed by the policy) - precise (allows exact y what the policy specifies) - broad (allows more than the policy does) a mechanism to allow or deny an entity’s access to a resource "principal" subject —> request —> guard monitor —> object Access control consists of two steps: : Who made the access request ? : Does subject s have access rights for resource o 7 We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? We distinguish: - a set of subjects or principals S - a set of objects 0 - a set of access modes A Simplest: A = {observe, alter} Usually not enough The Bell-LaPadula model refines this to: A = {execute, read, append, write} When are distinctions between these modes useful ? log: append, without changing prior contents execute encryption, without knowing the key Marius Minea marius@cs upt ro 28 September 2016 improve skills: write robust, secure code understand program internals learn about security vulnerabilities, detection, prevention use tools to reverse engineer and analyze code perhaps in the future: analyze and counter malware We know the basics: logically, the program has different memory areas: code (global) data stack (for function calls) heap (for dynamic allocation) What can we find out about them by running a program ? (look at various addresses printed by progsegs c) Addresses are in different numeric ranges Recursive caii: new copies for each instance can determine size of Total address range (from code to stack) is HUGE orders of magnitude more than computer memory => these are (virtual), not physical addresses Running the program repeatedly, addresses differ estimate: how many bits vary ? protects against attacks that need to know address values high address stack coiruinand-line arguments j and environment variables heap uninitialized data (bss) initialized to zero by exec low address initialized data read from program file by exec Figure: http:   www geeksforgeeks org memory-layout-of-c-program  http:  www backerStreet corn red stack frames htm A mapping from logical to physical addresses supported by processor hardware (memory management unit) and operating system - provides (program need not worry about size and usage of physical memory) virtual address space can be larger than physical memory memory transferred to from secondary memory (disk) as needed - provides can set up for memory segments memory space of one process protected from another but: can also set up memory Figure 8 3 Address Translation in a Paging System 4Ж Figure: W Stallings, Operating Systems, 6th ed What difference (if any) is there between s[] = ; and *p = ; ? What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ sizeof(s) is 5 * sizeof(char) &s is s, but different type, address of 5-char array: char (*) sizeof (entire array) is not strlen (up to ’ 0’) What difference (if any) is there between s[] = ; and *p = ; ? : char s [] = "test"; s is ’t’, s is ’ 0’ etc s is a (char *), not a variable in memory CANNOT assign s = but may assign s = ’f ’ sizeof(s) is 5 * sizeof(char) &s is s, but different type, address of 5-char array: char (*) sizeof (entire array) is not strlen (up to ’ 0’) : char *p = "test"; p is ’t’, p is ’ 0’ (same) p is a (char *), has a memory location CANNOT assign p —= ’f' ("test" is a string ) can do p = s; thenpfO] = ’f’; can assign p = "ana"; sizeof (p) is sizeof (char *) &p is NOT p => WRONG: scanf ("704s",—fcp); RiGHT: scanf ( , p); (if p is valid address and has room The is а Can declare [LEN] , *pa; and assign Similar: a and pa have same type: * But: pa is a uses memory; (array has fixed address) pa = a; pa = addr a = addr a is a address (hex) ; * = a; *a and *pa: indirections with different operations in machine code: *a references object from address (direct addressing) *pa must first get of variable pa (an address), loading it from the constant address &pa) then dereference it (indirect addressing) Suppose we want to process a bitmap file Bitmap file header This block of bytes is at the start of the file and is used to identity the file A typical application reads this block first to ensure thatthe file is actuallya BMP file and thatit is not damaged The first 2 bytes of the BMP file formatate the character"B" then the character"M" in ASCii encoding All of the integer values are stored in little-endian format (i e least-significant byte first) Offset hex Offset dec Size Purpose 00 0 2 bytes The headerfield used to identity the BMP and DiB file is 0x42 0x4D in hexadecimal, same as bm in ASCii The following entries are possible:   BM-Windows 3 1x, 95, NT, etc   BA - OS 2 struct bitmap array   Ci - OS 2 struct colori con   CP - OS 2 const color pointer   iC - OS 2 struct icon   PT-OS 2 pointer 02 2 4 bytes The size of the BMP file in bytes 06 6 2 bytes Reserved; actual value depends on the application that creates the image 08 8 2 bytes Reserved; actual value depends on the application that creates the image OA 10 4 bytes The offset, i e starting address, of the byte where the bitmap image data (pixel array) can befound https:  en wikipedia org wiki BMP file format Offset (hex) Offset (dec) Size (bytes) Windows BiTMAPiNFOHEADERl’l OE 14 4 the size of this header (40 bytes) 12 18 4 the bitmap width in pixels (signed integer) 16 22 4 the bitmap height in pixels (signed integer) 1A 26 2 the number of color planes (must be 1) 1C 28 2 the number of bits per pixel, which is the color depth of the image Typical values are 1,4, 8,16, 24 and 32 1E 30 4 the compression method being used See the next table for a list of possible values 22 34 4 the image size This is the size of the raw bitmap data; adummyO canbe given for Bi RGB bitmaps 26 38 4 the horizontal resolution of the image (pixel permeter, signed integer) 2A 42 4 thevertical resolution ofthe image (pixel permeter, signed integer) 2E 46 4 the number of colors in the color palette, or 0 to default to 2n 32 50 4 the number of important colors used, or 0 when every color is important; generally ignored То work with ints that are 2 bytes, 4 bytes, etc , need (since C99) int8 t, intl6 t, int32 t, int64 t, uint8 t, uint!6 t, uint32 t, uint64 t BMP specification: "all integers are stored in little-endian format" little-endian: least-significant byte first 0x12345678 is stored as 0x78 0x56 0x34 0x12 intel x86 big-endian: most-significant byte first 0x12345678 is stored as 0x12 0x34 0x56 0x78 Mac, PPC, Sun, internet (also called ’network byte order’) Маке sure values are read written from to file in correct byte order Allow program representation and manipulation at source or binary level Built-in analyses + APi to write your own LLVM: one of the most widely used, complete compiler toolchain BAP (D Brumley, CMU): OCaml + Python bindings team won DARPA Cyber Grand Challenge 2016 angr (UC Santa Barbara): Python framework CiL (G Necula, Berkeley): OCaml + Perl outputs instrumented C code Analysis library provides a data type to represent statements stmtkind = i instr instr list i Return exp option * location i Goto stmt ref * location i Break location i Continue location i if exp * block * block * location i Switch exp * block * (stmt list) * location i Loop block * location * (stmt option) * (stmt option) i Block block instr = i Set ival * exp * location i Caii ival option * exp * exp list * location ival = ihost * offset ihost = i Var varinfo i Mem exp To instrument code, traverse statements (control flow graph), identify interesting statements, insert new ones e g can log all some memory writes Address sanitizer (with recent clang   gcc versions) ( ) { *p = malloc(20); strcpy(p, ); puts(p); free(p); p[U = ; } % gcc -fsanitize=address usefree c %  a out ==31741==ERR0R: AddressSanitizer: heap-use-after-free on address 0x60300000efel at pc 0x0000004008c6 bp 0x7ffeef2227b0 sp 0x7ffeef2227a8 WRiTE of size 1 at 0x60300000efel thread TO #0 0x4008c5 in main  home marius curs bitdef usefree c:11 automated vulnerability detection + exploit generation comparison of old (buggy) + patched program versions => exploit generation ’compilers’ for return-oriented programming exploits A good read (insights into research advances): G Vigna et al , (State of) The Art of War: Offensive Techniques in Binary Analysis, iEEE Security Privacy, 2016 Marius Minea marius@cs upt ro http:  cs upt ro  marius curs cp  27 September 2016 no prior knowledge needed for those who know, hopefully learn more imperative programming in C some insight into alternatives handle errors test your code think of corner cases developed in 1972 at by Dennis Ritchie together with the UNiX operating system and its tools (C first developed under UNiX, then UNiX was rewritten in C) Brian Kernighan, Dennis Ritchie: (1978) Mature language, but still evolving ANSi C standard, 1989 (American National Standards institute) then iSO 9899 standard (versions: C90, C99, ) developed in 1972 at by Dennis Ritchie together with the UNiX operating system and its tools (C first developed under UNiX, then UNiX was rewritten in C) Brian Kernighan, Dennis Ritchie: (1978) Mature language, but still evolving ANSi C standard, 1989 (American National Standards institute) then iSO 9899 standard (versions: C90, C99, ) : direct access to data representation, freedom in working with memory, good hardware interface , large code base (libraries for many purposes) : good compilers that generate compact, fast code : very easy to таке i input data - through (mathematical) computations (produces) results input data - through (mathematical) computations (produces) results in mathematics, computations are expressed by we predefined functions (sin, cos, etc ) we new functions (for the given problem) we functions into more complex computations in programming, we use functions in a similar way Programs are into functions (methods, procedures) Splitting into functions helps NOT one huge piece of code Functions can be , making development efficient Functions are core for the paradigm computation is function , not assignmcnt Functions are core to defining what is (recursive functions, lambda calculus) Squaring for integers: sqr : Z —> Z sqr(x) = x • x function function parameter type name type and name ( ) x * x; } Squaring for integers: function function parameter type name type and name sqr : Z —> Z ( ) sqr(x) = x • x x * x; } A function contains: the function , specifying: the type (range) of function values (int), function name (sqr) and parameters (the integer x) the function , within { }: here, the return with an that gives the function value from its parameters There are precise for writing in the language (the ): language elements are written in a given ; are used to precisely delimit them: ( ) ; { } syntax: detail (keywords, punctuation) vs syntax: essence (language elements concepts) function function parameter type name type and name ( ) x * x; } , function function parameter syntax: detail type name type and name (keywords, punctuation) ( ) vs { syntax: essence x * x; (language elements concepts) Essence: : function, parameter(s) : of parameter(s) and return value cannot omit (some languages: can infer types) one precise type (some languages: polymorphism, overloading) (what is computed) Details (concrete syntax): keyword, punctuation: { ; order (types first) Squaring for reals' sqrf : R —> R sqrf (x) = x • x sqrf( x) x * x; } Another function domain and range (reals) a different function even the * operator is now defined on a different set (type) Need different name to distinguish from sqr in the same program Squaring for reals' sqrf : R —> R sqrf (x) = x • x sqrf( x) x * x; } Another function domain and range (reals) a different function even the * operator is now defined on a different set (type) Need different name to distinguish from sqr in the same program and denote A is a together with a allowed for these values For reals, it is preferable to use the type (double precision) (used by library functions: sin, cos, exp, etc ) Numeric types differ in C and mathematics in math: ZcR, both are , R is dense uncountable in C: , , are both have , reals have to remember this! (overflows, precision loss) default math functions use , you should too! The type of numeric depends on their writing 2 is an integer, 2 0 is a real for reals: 1 0e-3 instead of 0 001 writing 1 0 or 1 is equivalent, same for 0 1 and 1 + *   Multi plication must be written explicitly 1 we can’t write 2x, but 2 * x (or x * 2) Some operators have different meanings for integers and reals and different results! has an 11! (division with remainder) 7   2 is 3, but 7 0   2 0 is 3 5 -7   2 is -3, likewise -(7   2) (integer division truncates towards zero) + *   Multi plication must be written explicitly 1 we can’t write 2x, but 2 * x (or x * 2) Some operators have different meanings for integers and reals and different results! has an ii! (division with remainder) 7   2 is 3, but 7 0   2 0 is 3 5 -7   2 is -3, likewise -(7   2) (integer division truncates towards zero) The operator % is only defined for integers 9   5 = 1 9 7 5 = 4 9   -5 = -1 -9   5 = -1 -9 % 5 = -4 -9   -5 = 1 9 % -5=4 -9 % -5 = -4 Rule for integer division: a = a   b * b + a 7, b =^- sign of remainder is same as sign of dividend : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( , , ) (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, a!2 34, exit, main, printf, int!6 t : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( , , ) (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, al2 34, exit, main, printf, intl6 t integer: -2; floating point: 3 14; character: ’a’, string: "a" : have a predefined meaning (cannot be changed) Examples: statements ( ), types ( , , ) (e g sqr, x) chosen by the programmer to name functions, parameters, variables, etc An identifier is a sequence of characters comprised of letters (upper and lower case), underscore and digits which does not start with a digit and is not a keyword Examples: x3, al2 34, exit, main, printf, intl6 t integer: -2; floating point: 3 14; character: ’a’, string: "a" , with various meanings: * is an operator ; terminates a statement parantheses ( ) around an expression or function parameters braces { } group declarations or statements Example: the discriminant of a quadratic equation: a   x2 + b- x + c = Q discrim( a, b, c) b*b-4*a*c; } Between the parantheses ( ) of the function header there can be arbitrary comma-separated parameters, each with its own type must give type for each parameter, even if types are the same So far, we have only functions, without using them The value of a function can be in an expression Syntax: like in mathematics: function(param, param,    , param) Example: in the discriminant, we could use the sqrf function: return sqrf(b) - 4 * a * c; So far, we have only functions, without using them The value of a function can be in an expression Syntax: like in mathematics: function(param, param,    , param) Example: in the discriminant, we could use the sqrf function: return sqrf(b) - 4 * a * c; Or, using the previously defined sqr function we can define: ( ) x * sqr(x); } So far, we have only functions, without using them The value of a function can be in an expression Syntax: like in mathematics: function(param, param,    , param) Example: in the discriminant, we could use the sqrf function: return sqrf(b) - 4 * a * c; Or, using the previously defined sqr function we can define: ( ) x * sqr(x); } iMPORTANT: in C, any identifier must be (we must know what it represents, including its type) The above examples assume that sqrf and sqr are defined before discrim and cube respectively in the program { } 0; The smallest program: it does not do anything! Any program contains the function and is executed by calling it at program start in main, other functions may be called Неге, main does not have any parameters ( ) void is a keyword for the empty type (without any element) main returns an int, interpreted as exit status by operating system: 0 = successful termination, 0 is an error code 0; at the end of main is optional (if end brace is reached, 0 is returned by default; still most programs have it explicit) { 0; } Programs may contain comments, placed between  * and *  or starting with    until (and excluding) the end of the line Comments are stripped by the preprocessor They have no effect on code generation or program execution { 0; } Programs may contain comments, placed between  * and *  or starting with    until (and excluding) the end of the line Comments are stripped by the preprocessor They have no effect on code generation or program execution Programs commented so a reader can understand (including the writer, at a later time) as documentation (may specify functionality, restrictions, etc ) explain function parameters, result, local variables specify preconditions, postconditions, error behavior ( ) printf (from "print formatted"): a standard library function is NOT a statement or a keyword is called here with one string parameter string constants are written with double quotes " "  n denotes the newline character { ( ) } printf (from "print formatted"): a standard library function is NOT a statement or a keyword is called here with one string parameter string constants are written with double quotes " "  n denotes the newline character The first line is a , it includes the stdio h which contains the of the standard input output functions = type, name, parameters: needed to use the function (compiled object code): in a which is linked at compile-time, loaded at execution time ( ) ( ); printf( , cos(O)); 0; ( ) { x * x; ( ) ( printf( , 2 * sqr(-3)); 0; To print the value of an expression, printf takes two arguments: - a character string (format specifier): ° "d or ° "i {decimal integer), %f (floating point) - the expression; type must be compatible with the specified one (programmer must check! compiler may warn or not) : in function, statements are executed in textual order But: return statement ends function execution (no further statement is executed) We cannot print a number like this: printf (5) We can write printf ( ) but this means printing a (although the effect is the same: one character printed) The first argument of printf must always be a string with or without format specifiers (special characters) Two distinct things: function ( ){ } function : sqr(2), sqr(a), etc Function definitions use (of parameters, variables, etc ) Function calls work with (2, the value of a, etc ) (they do compute with symbolic expressions) This program computes 26 = (2 • 22)2 ( ) printf( x * x; ( ) ( 0; , x, x*x); , sqr(2 * sqr(2))); What is the order of printed statements ? the square of 2 is 4 the square of 8 is 64 2 to the 6th is 64 in C, function arguments are passed all function arguments are (their value is computed) values are assigned to the (names from the function header) , function is and executes with these values This type of argument passing is named in C, function arguments are passed all function arguments are (their value is computed) values are assigned to the (names from the function header) , function is and executes with these values This type of argument passing is named The program starts executing main The first statement: printf( , sqr(2 * sqr(2))); doing the caii, printf needs the first argument: the value is known (a ) second argument: need to caii sqr (2 * sqr (2)) : the outer sqr also needs the value of its argument 2 * sqr (2) =^- need to caii sqr (2) first caii order: first sqr(2), then sqr(8), then printf C does do the following (other languages might ) Functions do start execution without computer arguments printf would print 2 to the 6th is , then need the value it would caii the outer sqr that writes the square of, then would need x it would caii sqr (2), write the square of 2 is 4, return 4, etc C does do the following (other languages might ) Functions do start execution without computer arguments printf would print 2 to the 6th is , then need the value it would caii the outer sqr that writes the square of, then would need x it would caii sqr (2), write the square of 2 is 4, return 4, etc Function parameters are substituted with printf would caii the outer sqr with the 2 * sqr (2) sqr (2) would be called twice for (2*sqr(2))*(2*sqr(2)) => in C, a function computes with , never with abs : Z —> Z x > O otherwise (x 0) => need a language construct that to decide which expression to evaluate, based on a (true false) Syntax of : condition ? exprl : expr2 - if the condition is true, only exprl is evaluated, its value becomes the result of the entire expression - if the condition is false, only expr2 is evaluated and its value becomes the value of the expression ( ) x >= 0 ? x : -x; } Comparison operators: == (equality), != (different), , >= iMPORTANT! The equality test in C is and not simple = ii! Note: abs exists as standard function, declared in stdlib h Г -1 X {—1, 0,1} sgn(x) = О The conditional operator has only one condition, and two branches But: either of the expressions can be arbitrarily complex must decompose the decision based on the value of x : key in problem solvi ng We rewrite the function with a single decision at any given point: 1Т 1Т sgn(x) = if x 0) -1 ( ifx = 0 0 [ else (x > 0) 1 sgn{x) ifx О')   ifx = 0 0 else (x U) | else (x > 0) 1 ( ) { x z) z else (x > И J ify z) z The minimum of two numbers is easily written: ( , ) x z) z else (x > И J ify z) z We notice the structure of min2 is repeated => can do it simpler: The result is the minimum between the minimum of the first two numbers and the third =^- just apply min2 twice! min2(min2(x, y), z); From mathematics, we know recurrence relations for | xq = b (i e : xn = b for n = 0) arithmetic sequence: 0 Example: 1,4, 7,10,13, ( ? = 1, r = 3) From mathematics, we know recurrence relations for arithmetic sequence: 0 Example: 1,4,7,10,13, (b = 1, r = 3) geometric sequence: 0 Example: 3, 6,12, 24,48, (b = 3, r = 2) xn is not computed directly, but , using xn i From mathematics, we know recurrence relations for | x0 = b (i e : xn = b for n = 0) arithmetic sequence: 0 Example: 1,4, 7,10,13, ( ? = 1, r = 3) i x0 = b (i e : xn = b for n = 0) geometric sequence: 0 Example: 3, 6,12, 24,48, ( ? = 3, r = 2) xn is not computed directly, but A notion is if it is , using x" i : write recurrences for: C", Fibonacci sequence, Recursion is fundamental in computer Science: it reduces a problem to a simpler case of the problem : a is a single element O sequence an element followed by a O O O O e g word (sequence of letters); number (sequence of digits) Recursion is fundamental in computer Science: it reduces a problem to a simpler case of the problem : a is a single element O sequence an element followed by a O O O O e g word (sequence of letters); number (sequence of digits) : a is step —> followed by a step traversing a path in a graph path 7"^—>' —> i a l a e g Recursion is fundamental in computer Science: it reduces a problem to a simpler case of the problem : a is {a single element O sequence an element followed by a O O O O e g word (sequence of letters); number (sequence of digits) : a is {a step —> path a followed by a step '——>' —> e g traversing a path in a graph An 'number (7) identifier (x) ( { n==0 ? 1 : x * pwr(x } ( ) { ( 0; } n = 0 otherwise (n > 0) ) n-1); , pwr(-2 0, 3)); n = 0 otherwise (n > 0) ( , ) { n==0 ? 1 : x * pwr(x, n-1); ( ) ( , pwr(-2 0, 3)); 0; : type of nonnegative integers (natural numbers) The of pwr is a of the function so it can be used in its own function body (recursive caii) Even if we write pwr (-2, 3), -2 (int) will be to float (the type declared for each parameter is known) The pwr function does two computations: -a (n == 0 ? ?) if so, return 1 - else, a multiply; the right operand requires a pwr(5, 3) Ш25 5* pwr(5, 2) W5 5* pwr (5, 1) И5 5 * pwr(5, 0) 1 in the recursive computation of the power function: Every caii makes , until the base case it reached Every caii executes , but with (own values for parameters) When reaching the base case, all started calls are still (each has to perform the multiplication with the result of the caii) Returning is done of the calls (caii with exponent 0 returns, then the one with exponent 1, etc ) Marius Minea 26 September 2016 Programming languages are and one of the oldest CS fields is an important current issue mainstream languages still appear and evolve (Java, C#, ) + lots of languages impacts (polymorphism, reflection, ), (type safety, interference), (compilation ), etc : needed in verification, testing, parallelization, certification, performance estimation, SiGPLAN motto: "Toexplore programming language concepts and tools focusing on design, implementation and efficient use " of programming languages Understand and impact of Learn language program (semantics, reasoning) introduction to current programming language "a programming language is a tool which should assist the programmer in the most difficult aspects of his art, namely program design, documentation, and debugging" [Hoare, Hints on programming language design, 1973] Main programming language conferences (ACM SiGPLAN) PoPL: Principles of Programming Languages PLDi: Programming Language Design and implementation OOPSLA: Object-Oriented Programming, Languages, Systems and Applications (now: SPLASH) All of them have "most influential paper award" (10 years later) + best paper award (current year) + 20 years of PLDi (1979-1999) symbolic computation lazy evaluation, closures, higher-order functions and continuations, concurrency, inter-process communication and synchronization, active objects and mobile agents, object views, directed interfaces, and dynamic type systems, reflection and introspection persistent object systems and garbage collection, error management, assertions and declarative debugging, aspect-oriented programming, generative programming, constraint imperative programming, staged compilation and virtual machines course, Linkdping University Functional programming simple mathematical Foundation: lambda calculus (possibly typed) in pure form avoids and ‘‘The determined Real Programmer can write functional programs in any language" (paraphrasing Ed Post) Exercise 1: program without state and variables in C Exercise 2: simulate state and an interpreter in Haskell   ML (lab) Programming encompasses three things: 1 a computation model: a formal system that defines a and how it is 2 a set of and used to write programs in that language 3 a set of for reasoning about programs and calculating their efficiency [vanRoy Haridi, Concepts, Techniques and Models of Computer Programming] = approach to programming based on a mathematical theory or a coherent set of principles many languages => fewer paradigms => still fewer concepts Key concepts form a paradigm’s Discipline and idea: Mathematics and the theory of functions Values produced are impossible to change part of a composite value But can make a revised сору of composite value : no matter when done, computation produces same value pure functional programming is side-effect free : all computations done by applying (calling) functions Functions are the natural (for expression evaluation) Functions are : full-fledged data just like numbers, lists, Computations after K Normark, course, Aalborg U A first-class object is one that can be: as an argument as a value, and in a data structure What is first-class influences your choices of abstraction: Languages with first-class functions can represent data as procedures Example: represent two constructors: empty environment enlarge environment with (symbol value) pair one observer: give value of Symbol in environment Functional   declarative operations are: (do not depend on any externai execution state) (no internai execution state remembered between calls) (same result when given same arguments) Why is functional programming important ? Declarative programs are compositional naturally concurrent (since stateless) Reasoning about declarative programs is simple [van Roy & Haridi] "This book brings you face-to-face with the most fundamental idea in computer programming: The interpreter for a computer language is just another program" Hal Abelson foreword to Friedman, Wand Haynes, Essentials of Programming Languages Writing an interpreter: makes you think about fundamental defines the meaning of programs: => our first lab assignment a name identifier to an object (expression value) : before running the program (e g , usual function caii) : at runtime (e g , 00 virtual method caii) Binding and variable assignment are NOT the same Pure functional languages have binding but do NOT have assignment (mutable values) Rebinding and mutation are NOT the same = a context to which objects (names, etc ) are associated an identifier is visible within its scope scoping determined by program text, not by runtime execution sequence aids modularity, understanding, reasoning (in isolation) scoping scope=remainder of the execution during which binding is in effect each identifier has stack of bindings (push pop on enter exit scope) meaning of code depends on past execution (of other code) Some languages allow choice of static   dynamic scoping (e g , Perl) Functions can be: passed as an argument returned as a value, and stored in a data structure Ex List map (fun x -> x + 1) Data List map ( x -> x + 1) (ML) (Haskell) = functions that return a function e g , (+) : int -> int -> int = (ML) (+) 3: int -> int = (same as fun x -> x + 3) A function of several parameters can be rewritten through currying (after Haskell Curry) fun x у -> X + у fun x -> fun у -> X + у = a together with an defining its needed to implement static scoping with first-order functions Python example [cf Wikipedia] def counterO: x = 0 def inc(): nonlocal x x += 1 print(x) return inc counterl inc = counterO counter2 inc = counterO counterl incO # 1 counterl incO # 2 counter2 incO # 1 counterl incO # 3 Marius Minea September 28, 2016 Black-box testing (no source access) Glass-box white-box testing (with source access) - Generating unit tests - Test coverage metrics Static analyis (of source code) Dynamic analysis Testing object-oriented programs Testing concurrent programs Formal verification of programs models Automated test generation - including Model-based testing Security testing Design of a test plan - medical system for radiation thrapy - 6 accidents with fatalities and grave wounds (1985-87, US, Canada) - direct cause: errors in control program - medical system for radiation thrapy - 6 accidents with fatalities and grave wounds (1985-87, US, Canada) - direct cause: errors in control program [Leveson 1995]: excessive trust in software when designing system lack of hardware safety measures (interlocks) lack of appropriate (defensive design, specification, documentation, simplicity, formal analysis, testing) - Self-destruction after a fault 40 seconds from launch (1996) - Cause: conversion of 64-bit float to 16-bit int generated unhandled overflow exception in the ADA program - Cost: s500 million (rocket), s7 billion (whole project) - Self-destruction after a fault 40 seconds from launch (1996) - Cause: conversion of 64-bit float to 16-bit int generated unhandled overflow exception in the ADA program - Cost: s500 million (rocket), s7 billion (whole project) main cause: code taken from the Ariane 4, without proper analysis: - execution of faulty code no longer needed at that flight stage - Ariane 4 had proved absence of overflow for unprotected variables, but new settings were different bad design of : the inertial reference system and its backup were taken out by the same error Error in the floating-point division algorithm SRT division algorithm, base 4 determines the next quotient digit from a lookup table some entries erroneously marked as "don’t care" cost: some s500 million Error in the floating-point division algorithm SRT division algorithm, base 4 determines the next quotient digit from a lookup table some entries erroneously marked as "don’t care" cost: some s500 million Circuit could have been verified formally - by automated theorem proving, or - with special data structures to represent multiplication and division but verification effort was focused on more complex parts (execution unit, cache coherence protocol) , 1997 once on Mars, the spacecraft was frequently resetting cause: between processes with common resources the problem and solution had been described in the literature [Sha, Rajkumar, Lehoczky Priority inheritance Protocols, 1990] 1 low priority process A acquires resource R 2 A interrupted by C (high priority) 3 C awaits availability of R; A resumes execution 4 A interrupted by long-running В (priorities: A a test discovers (and localizes) an error The role of a tester is - as early as possible (repair cost increases with time) - and ensure they get fixed (reports, debugging, maintenance) (Patton, Software Testing) They are explorers They are troubleshooters They are relentless They are creative They are (mellowed) perfectionists They exercise good judgment They are tactful and diplomatic (?) They are persuasive (!) A test case must define the expected result or output - otherwise, we will see what we want to see A programmer should avoid testing their own program - psychologically, does not want to find errors - exception: uni testing   test-driven development Corollary: test group should not be development group We need test cases for valid and invalid inputs Must test program does what is needed and doesn’t do what it shouldn’t Keep and reuse test cases! Don’t plan the test process assuming there won’t be errors! Probability to find errors in a piece of code is proportional to number of errors already found Software testing is an exercise of evaluating risks The more errors you find, the more there still are Pesticide paradox (Beizer): errors become resilient to tests (to find new errors, one needs new tests) Not all errors found will be corrected Product specifications are never definitive Testers are not the most popular project team members :) Software testing is a technical profession governed by a discipline Sample brief test report [Marnie Hutcheson, Software Testing Fundamentals] "As per our agreement, we have tested 67 percent of the test inventory [ ] the most important tests in the inventory as determined by our joint risk analysis The bug find rates and the severity composition of the bugs we found were within the expected range Our bug fix rate is 85 percent it has been three weeks since we found a Severity 1 issue There are currently no known Severity 1 issues open Fixes for the last Severity 2 issues were regression-tested and approved a week ago Overall, the system seems to be stable The load testing has been concluded The system failed at 90 percent of the design load The system engineers [ ] will need 3 months to implement the fix Our recommendation is to ship on schedule, with the understanding that we have an exposure if the system utilization exceeds the projections before we have a chance to install the previously noted fix " [Cem Kaner, Black-box software testing course, Florida inst of Tech] What do we test ? What do we want to achieve ? What is the testing ? How do we organize work to achieve the mission ? The test problem When have we tested enough ? The problem of in testing "A technical investigation conducted to provide quality-related information about a software product to a stakeholder" [Kaner] investigation: active, organized search for information technical: experiments, logic, models, algorithms, tools software product: everything the client gets (software, hardware, databases, documentation, etc ) stakeholders: in success of product, and of testing [ Kaner 2003 - What is a good test case ? ] - Find defects: especially in interesting parts (good coverage) - Maximize bug count: in limited time - Block premature release + help make ship no-ship decision - Minimize technical support cost - Assess conformance to specifications   rules   standards - Minimize risk (inel, safety-related lawsuit risk) - Find scenarios where product works (despite bugs) - workarounds - Assess quality: but: cannot ensure quality just by testing - verify correciness (absence of errors) - ensure quality (QA is a process issue) powerful: high chance of discovering bug if present credible realistic (to stakeholders): no corner cases (except: safety!) representative   likely to be encountered by customer easy to evaluate (is it a bug or not?)   easy to debug   informative appropriately complex (progressive) offer insight into some aspect of product   customer   environment (e g detect change in behavior   performance) Marius Minea September 29, 2016 Security of operating system + applications network security Security of operating system + applications network security vulnerabilities and their prevention security of web applications Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security of operating system + applications network security vulnerabilities and their prevention security of web applications foundational for all of security Security and their modeling authentication, key generation exchange, etc principles and tools for modeling and analysis "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) "Security is [ ] preventing adverse consequences from the intentional and unwarranted actions of others" [Bruce Schneier, Beyond Fear] "Computer Security deals with the prevention and detection of unauthorized actions by users of a computer system" [D Gollmann] A security system attacks possibly: detection, recovery, repair Security deals with actions incidental actions: safety security i) actions (from victim point of view); need not be illegal